Hello,

I would also like to request HDF5 to please consider supporting
UTF-32. One benefit of UTF-32 is that it is not a variable-length
encoding. Indexing the code points is a constant-time operation, as
opposed to the sequential access requirement in variable-length
encodings. The scientific Python community, which is large and
growing, is in the process of migrating from python-2 to python-3. All
strings in Python-3 are Unicode, and as Andrew mentioned, NumPy
(Python's array package) addresses the need for storing fixed-length
Unicode strings in the most general way: a Unicode string datatype
consisting of fixed-length of UTF-32 code points. But there doesn't
appear to be a way to store this datatype in HDF5. Would you please
consider adding support for this datatype in a future version of HDF5?

Thank you,
Darren


On Mon, Oct 10, 2011 at 9:29 PM, Andrew Collette
<[email protected]> wrote:
> Hi all,
>
> Some of my users have been asking about storing UTF-16 or UTF-32
> fixed-length strings in HDF5.  Are there currently any plans to
> support wide character datatypes?  Note this is a slightly different
> thing than UTF-8 support, which results in variable-length data; for
> example, NumPy has a Unicode string datatype consisting of a fixed
> length of UTF-32 code points.
>
> Thanks!
> Andrew
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to