On Tue, Apr 26, 2011 at 12:24:40PM +0200, Kraus Philipp wrote:
> Hello,
> 
> I read / write strings in my HDF files to copy data between Matlab and my C++ 
> code. I have some problems with ASCII Codes greater 127 in my files. 

Then they aren't ASCII codes since ASCII only defines those
between 0 and 127 and I guess that's were the problems starts...

> The dump of a HDF5 file (Matlab) shows:
> GROUP "/" {
> DATASET "data" {
>    DATATYPE  H5T_STRING {
>          STRSIZE 2;
>          STRPAD H5T_STR_NULLTERM;
>          CSET H5T_CSET_ASCII;
>          CTYPE H5T_C_S1;
>       }
>    DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
>    DATA {
>    (0): "\37777777744", "\37777777766", "\37777777774", "\37777777737"

This is basically '0xE4', '0xF6', '0xFC' and '0xDF', which are
'ä', 'ö', 'ü' and 'ß' in e.g. the ISO-8859-1 encoding. Of course
modulo the length of the values, they are output as 32-bit while
a char would typically only have 8.

>    }
> }
> }
> 
> The chars are "ä", "ö", "ü", "ß". My code creates the same chars of a string 
> with string.c_str() to:
> GROUP "/" {
> DATASET "test" {
>    DATATYPE  H5T_STRING {
>          STRSIZE 3;
>          STRPAD H5T_STR_NULLTERM;
>          CSET H5T_CSET_ASCII;
>          CTYPE H5T_C_S1;
>       }
>    DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
>    DATA {
>    (0): "\37777777703\37777777644", "\37777777703\37777777666",
>    (2): "\37777777703\37777777674", "\37777777703\37777777637"
>    }

The first number is '0xC3' and '0xA4', which is the UTF-8 represen-
tation of 'ä' (didn't check the rest, but the '0xC3' at the start
of all of them smells a lot like they all are UTF-8).

> }
> }
> 
> It seems that my code create 2 bytes for the ä, ö, ü, ß and Matlab 1 byte.
> Can I switch the encoding in the HDF5 file or can I use unicode or anything
> else?

Don't know if you can get MatLab to use UTF-8, concerning the
HDF5 file the question is how it is written. The program it's
written by seems to use UTF-8. Can you change that? If this
is a test program with the 'äöüß' hard-coded into it you just
may have to get your editor to use ISO-8859-1. Writing to the
HDF5 file isn't the point, it just contains what you told it
to, the problem is passing in the values you want it to con-
tain.
                              Regards, Jens
-- 
  \   Jens Thoms Toerring  ________      [email protected]
   \_______________________________      http://toerring.de

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to