Hi Andrew,

On Apr 26, 2011, at 8:07 PM, Andrew Collette wrote:

> Hi,
> 
> I'm curious as to how HDF5 treats character-set information during
> type conversion.  There are functions H5Tset_cset and H5Tget_cset in
> the API.  What happens if I try to read data defined as H5T_CSET_ASCII
> into a buffer defined as H5T_CSET_UTF8, and vice-versa?  What if the
> buffer defined as ASCII contains characters > 127, but isn't UTF-8
> compliant?  I ask because in practice I've noticed that H5T_CSET_ASCII
> seems to be used to indicate data of an unknown encoding.

        Sorry for the delay in reply, I wanted to verify the library's behavior 
and it took a little while to find a gap in my schedule.

        Currently, the library will neither convert the data, nor fail to 
perform a read/write operation with two different character sets - it treats 
the UTF-8 and ASCII string datatypes as identical (see attached little test 
program).

        However, I'm inclined to change that behavior and have the conversion 
fail, so that application developers and users aren't surprised.  Then, once we 
find out the correct behavior and can implement a bridge between the two 
character sets (at least from ASCII to UTF-8), we can enable the proper 
behavior.

        How's that sound to people?

        Quincey

Attachment: test_utf8.c
Description: Binary data

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to