Hi Andrew, On Apr 26, 2011, at 8:07 PM, Andrew Collette wrote:
> Hi,
>
> I'm curious as to how HDF5 treats character-set information during
> type conversion. There are functions H5Tset_cset and H5Tget_cset in
> the API. What happens if I try to read data defined as H5T_CSET_ASCII
> into a buffer defined as H5T_CSET_UTF8, and vice-versa? What if the
> buffer defined as ASCII contains characters > 127, but isn't UTF-8
> compliant? I ask because in practice I've noticed that H5T_CSET_ASCII
> seems to be used to indicate data of an unknown encoding.
Sorry for the delay in reply, I wanted to verify the library's behavior
and it took a little while to find a gap in my schedule.
Currently, the library will neither convert the data, nor fail to
perform a read/write operation with two different character sets - it treats
the UTF-8 and ASCII string datatypes as identical (see attached little test
program).
However, I'm inclined to change that behavior and have the conversion
fail, so that application developers and users aren't surprised. Then, once we
find out the correct behavior and can implement a bridge between the two
character sets (at least from ASCII to UTF-8), we can enable the proper
behavior.
How's that sound to people?
Quincey
test_utf8.c
Description: Binary data
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
