Hello, I would also like to request HDF5 to please consider supporting UTF-32. One benefit of UTF-32 is that it is not a variable-length encoding. Indexing the code points is a constant-time operation, as opposed to the sequential access requirement in variable-length encodings. The scientific Python community, which is large and growing, is in the process of migrating from python-2 to python-3. All strings in Python-3 are Unicode, and as Andrew mentioned, NumPy (Python's array package) addresses the need for storing fixed-length Unicode strings in the most general way: a Unicode string datatype consisting of fixed-length of UTF-32 code points. But there doesn't appear to be a way to store this datatype in HDF5. Would you please consider adding support for this datatype in a future version of HDF5?
Thank you, Darren On Mon, Oct 10, 2011 at 9:29 PM, Andrew Collette <[email protected]> wrote: > Hi all, > > Some of my users have been asking about storing UTF-16 or UTF-32 > fixed-length strings in HDF5. Are there currently any plans to > support wide character datatypes? Note this is a slightly different > thing than UTF-8 support, which results in variable-length data; for > example, NumPy has a Unicode string datatype consisting of a fixed > length of UTF-32 code points. > > Thanks! > Andrew > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
