Hmm. This is an interesting discussion. Let me see if I can add two centsŠ

The HDF5 library allows you to define your own 'filters' which operate on
the data in-transit as it is written to and read from the file. The filters
are just call backs made from the HDF5 library to your user-defined code to
operate on chunks of the dataset as they are emitted underneath the
H5Dwrite
and H5Dread calls.

If you write data via some user-defined filter then any reader will need
to have access to the code that does the reverse operation (decrypt in
your case
and, of course any decryption keys). So, there is already implied in this
that
if you define some 'weird' filter, none of the existing HDF5 tools will be
able
to read your data (hdfview, h5dump, or third party applications that read
HDF5
like IDL, MATLAB, VisIt, etc.). But, given that you are talking about
encryption
here, I suspect that such an outcome is actually perfectly fine.

So, only applications that have access to your reader code (decryption
filters)
will be able to read the data.

And, why not handle that the way something
like ssh does it now. Your reader 'filter' would have to acquire the key
from
~/.ssh/id_rsa and then use what it gets to decrypt the chunks getting read
during H5Dread. Failure to acquire the key would result in a filter error
and
ultimately a read error in H5Dread's error stack. You could do some work to
detect this case and report a useful error message (e.g. "no appropriate
key
to read encrypted data").

Would you have a single HDF5 file with datasets encrypted for different
ids?
If so, I think the ssh-like mechanim still works.

Because 'filter' operations apply only to the raw data of a dataset, the
metadata
is not encrypted. This means things like the names, dimensions, datatypes,
etc
(and any attributes defined on the datasets) cannot be encrypted via the
'filter'
approach. Perhaps this is why another responder mentioned the introduction
of
a Virtual File Driver that collects metadata together and encrypts that
separately.
I could see how that could be important in certain circumstances.

Some other issues are that 'filters' can be applied only when dataset are
'chunked'.
And, the filters are then applied independently to each chunk. So, what
you get for
a single dataset is a bunch of chunks, each chunk independently encrypted.
So, you
don't have the whole dataset encrypted in one fell swoop. I don't think
that would
cause problems but thought I would mention it.

HDF5 can be 'smart' about applying filters and wind up NOT applying a
requested filter
in circumstances where you tell it the filter is optional. So, you have to
take care
to be sure your filter won't be treated by HDF5 that way and wind up
skpping and
encryption filter it should not have. Just be sure to set up the filters
correctly
when you define them to HDF5.

Will encryption *increase* the size of the data being written? I don't
think it does
but I guess its always possible depending on what you are doing. If so,
HDF5 may not
be able to tolerate that. It may expect chunks to be equal to or less than
in size
that the un-filtered chunks and error-out (or skip such a filter) if that
is not the
case. So, just be sure too review the documentation on these details.

I guess this is a long winded way of saying I think you could make it work
within
the limitations of some of the issues I mention above. And, I think you
can invent
a way to handle the keys that can probably be made to work.

Hope that was helpful.

Mark


On 3/21/14 3:23 AM, "huebbe" <[email protected]>
wrote:

>While it is possible to perform some encryption in a filter, the filter
>mechanism is not designed for encryption. The problem is the key:
>Filters don't get arbitrary data from the calling application to do the
>decryption, they get only data that is stored in the file. Otherwise,
>the HDF5 library would not be able to do the decoding in a completely
>transparent way. And if you put the key into the file (as filter
>options, or similar), the NSA will be happy.
>
>To use the filter mechanism for encryption, you would need to get the
>key via a side-channel. This is possible, but it will be hard to do this
>in a usable and portable fashion. For instance, you cannot just pop up a
>dialog asking for a key, because many programs using HDF5 don't even
>have a text terminal connected to them while they run.
>
>Also note that filtering does not touch the metadata in the file. I. e.
>the NSA will be able to see the entire description of what is encoded in
>the file, they will just not have the actual data.
>
>If you want security, just use gpg to encrypt the entire file.
>
>Cheers,
>Nathanael Hübbe
>
>
>
>On 03/21/2014 12:44 AM, Rowe, Jim wrote:
>> Hello ­ has anyone used a symmetric encryption filter with HDF5?  I
>> would like to introduce encryption (AES, DES, 3DES) in the pipeline
>> after zlib compression to encrypt some datasets.
>> 
>>  
>> 
>> Any examples, starting points, or suggestions would help.
>> 
>>  
>> 
>>  
>> 
>> Thanks!
>> 
>> --Jim
>> 
>>  
>> 
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> 
>>http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.
>>org
>> 
>
>
>-- 
>Please be aware that the enemies of your civil rights and your freedom
>are on CC of all unencrypted communication. Protect yourself.
>


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to