Hi Chris,

>> A Latin-1 based 'a' type
>> would have similar problems.
>
> Maybe not -- latin1 is fixed width.

Yes, Latin-1 is fixed width, but the issue is that when writing to a
fixed-width UTF8 string in HDF5, it will expand, possibly losing data.

What I would like to avoid is a situation where a user writes a
10-byte string from NumPy into a 10-byte space in an HDF5 dataset, and
unexpectedly loses the last few characters because of the encoding
mismatch.

People are used to truncation when e.g. storing a 20-byte string in a
10-byte dataset, but it's surprising when the source and destination
are the same size. :)

In any case, I certainly agree NumPy shouldn't be limited by the
capabilities of HDF5.  There are other valuable use cases, including
access to the high-bit characters Latin-1 provides.  But from a strict
compatibility standpoint, ASCII would be beneficial.

Andrew
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to