Hi all, I have a question for the list sparked by this discussion of a bug in NumPy 1.6.2 and 1.7:
http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064682.html and this open issue in h5py: https://code.google.com/p/h5py/issues/detail?id=217 In h5py we need to represent variable length strings and HDF5 object references within the existing NumPy dtype system. The way this is handled at the moment is with object (type "O") dtypes with a small amount of metadata attached; in other words, an "O" array could have a dtype marked as representing variable-length strings, and HDF5 would convert the Python string objects into the corresponding type in the HDF5 file. Likewise, an "O" dtype marked as containing HDF5 object references (h5py.Reference instances) would be converted to native HDF5 references when written. The trouble I'm having is trying to attach metadata to a dtype in such a way that it is preserved in NumPy. Right now I create an "O" dtype with a single field and store the information in the field "description", e.g.: dtype(('O', [( ({'type': bytes},'vlen'), 'O' )] )) This works (it's how special types have worked in h5py for years) but is quite unwieldy, and leads to interesting side effects. For example, because of the single field used, array[index] returns a 1-element NumPy array containing a Python object, instead of the Python object itself. Worse, our fix for this behavior (remove the field when returning data from h5py) triggered the above bug in NumPy. Is there a better way to add metadata to dtypes I'm not aware of? Note I'm *not* interested in creating a custom type; one of the advantages of the current system is that people deal with the resulting "O" object arrays like any other object array in NumPy. Andrew Collette _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
