Hi,

 while I don't have experience with bitfields in HDF5 myself, maybe using an 
enum type could help in your case. You can create an enum type which associates 
names to each of the values, store it as a named datatype in the file, and when 
you do an h5ls it will show the possible combinations and the current value, 
such as in this case:

    Attribute: TypeInfo  scalar
        Type:      shared-1:19496 enum native int {
                   UnknownArrayType = 0
                   Contiguous       = 1
                   SeparatedCompound = 2
                   Constant         = 3
                   FragmentedContiguous = 4
                   FragmentedSeparatedCompound = 5
                   DirectProduct    = 6
                   IndexPermutation = 7
                   UniformSampling  = 8
                   FragmentedUniformSampling = 9
               }
        Data:  UniformSampling

Mostly it appears to be an issue of H5view or the h5ls tools, whether these 
tools can display a bitfield by its components. It would still be 
self-describing if you use your bitfield, but h5ls/h5dump don't resolve it the 
way you'd like it. So possibly you might just want to modify h5ls/h5dump to 
show the information as seems appropriate as an easier workaround? For 
instance, I find it also annoying that for an enum type the h5ls tools always 
shows the entire list of possible enum values for each data element, whereas it 
would be sufficient to show these possible values only where the enum is 
defined as named datatype.

     Werner


On Fri, 13 Aug 2010 14:04:16 +0200, Steve Bissell <[email protected]> 
wrote:


I am working on an application to record data in HDF5 format, and I'm
completely new to it.
The data is in the form of packets, each of which has an associated
timestamp and class.
Therefore, it would seem appropriate to use the FL_PacketTable class (99% of
the packets are fixed length, so this is my core use case).
The class of the packet indicates the packet contents, and each class
appears to map naturally to the HDF5 "Compound" data type, with a struct for
each class of packet.
Note also that data is retrieved from a legacy file format that uses
individual bits to represent certain data.

So far, so good. I can produce an hdf5 file with the following code
(C++/win32/VisStudio2005); assume that the file object and the group V3 are
defined.

//structured data - "compound" in the HDF5 terminology.
struct _my_type {
   double t;//e.g. time.
   int a;
   float b;
};
CompType mtype1( sizeof(_my_type) );
mtype1.insertMember( "time", HOFFSET(_my_type, t), PredType::NATIVE_DOUBLE);
mtype1.insertMember( "alt", HOFFSET(_my_type, a), PredType::NATIVE_INT);
mtype1.insertMember( "math", HOFFSET(_my_type, b), PredType::NATIVE_FLOAT);

FL_PacketTable pt(V3.getId(),"Packets",mtype1.getId(),500,6);
_my_type s1;
for (int i = 0; i< 400000; i++)
{
        s1.t = i/10.f;//monotonic time
        s1.a = i % 10;//sawtooth integer data
        s1.b = 100.f/(i+1);//math function
        pt.AppendPacket(&s1);
}

The resulting file is maximally self describing, in that when opened with
hdfView, I see a packet table with columns headed time, alt, math, and my
"packets" in the records below.

Now what I would like to do is achieve the same maximally self describing
file for the amended compound type:

struct _my_type {
   double t;//e.g. time.
   int a;
   float b;//so far, so easy....
   //BUT, we would also like...
   union {
           struct {
                   unsigned char bit0 : 1;//ideally, should be able to map each 
bit's
value
                   unsigned char bit1 : 1;//to one of a pair of strings, e.g. 
"VALVE_OPEN"
/ "VALVE_CLOSED"
                   unsigned char bit2 : 1;//by using, perhaps, something like 
the
ENUMERATION feature of
                   unsigned char bit3 : 1;//HDF5.
                   unsigned char bit4 : 1;
                   unsigned char bit5 : 1;
                   unsigned char bit6 : 1;
                   unsigned char bit7 : 1;
           };
           //..and ideally would ALSO like to be able to retrieve the entire 
field,
as below....
           unsigned char wholebyte;
   };
};


If I now amend my code to do:

mtype1.insertMember( "wholebyte", HOFFSET(_my_type, wholebyte),
PredType::NATIVE_UCHAR);
s1.wholebyte = 0;

for (int i = 0; i< 400000; i++)
{
        s1.t = i/10.f;//monotonic time
        s1.a = i % 10;//sawtooth integer data
        s1.b = 100.f/(i+1);//math function
        s1.bit1 = ( (0 == (i % 20)) ? 1 : 0);//bit1 goes true every 20th element
        s1.bit2 = ( (10 < (i % 20)) ? 1 : 0);//bit2 goes true about 1/2 the time
        s1.bit3 = ( (10 > (i % 30)) ? 1 : 0);//bit3 goes true about 1/3 the time
        pt.AppendPacket(&s1);
}


then I do indeed see "wholebyte" and its data as an extra column in hdfview.
But end-users will certainly want to see individual bit values, rather than
the entire byte.

So - and this is my problem - if I do this instead (i.e. I do not insert
wholebyte):

//Create single bit transient types, then commit them to the dataset.
//Q: are these types modifying the original types, or are they "copies" in
the H5Tcopy sense?
//Not yet clear without examining c++ library behaviour further.....
IntType mySingleBit1Type(PredType::STD_B8LE);
mySingleBit1Type.setPrecision(1);
mySingleBit1Type.setOffset(1);
mySingleBit1Type.commit(V3,"Bit1Type");

mtype1.insertMember( "bit1", HOFFSET(_my_type,wholebyte), mySingleBit1Type);

Then I do NOT see “bit1” as a field in the packet table using hdfview – that
is, the “self describing” aspect fails.

Worse, if I attempt to define and insert another bit type, as below:

IntType mySingleBit2Type(PredType::STD_B8LE);
mySingleBit2Type.setPrecision(1);
mySingleBit2Type.setOffset(2);
mySingleBit2Type.commit(V3,"Bit2Type");
mtype1.insertMember( "bit2", HOFFSET(_my_type,wholebyte), mySingleBit2Type);

Then I get a "member overlaps with another member" exception from
H5Tcompound.c. This is not surprising, since the API only appears to allow
BYTE offsets.

Now some obvious, but ugly workarounds exist. I could, for example, store my
original bit data as bytes. But this would be very inefficient, in terms of
storage, unless the magic of compression would reduce the problem …..

I can’t believe I’m the first person to encounter this issue, much more
likely is that I’m still too stupid to understand how best to define the bit
fields. Does anyone have any ideas? I'm aware that the above code may not be
completely platform portable in theory due to the C specification not
specifying exactly where bits might be put within the machine word, but this
isn't an issue in our case (at the moment!)
Thanks!



--
___________________________________________________________________________
Dr. Werner Benger                Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to