Hi Chris, >> A Latin-1 based 'a' type >> would have similar problems. > > Maybe not -- latin1 is fixed width.
Yes, Latin-1 is fixed width, but the issue is that when writing to a fixed-width UTF8 string in HDF5, it will expand, possibly losing data. What I would like to avoid is a situation where a user writes a 10-byte string from NumPy into a 10-byte space in an HDF5 dataset, and unexpectedly loses the last few characters because of the encoding mismatch. People are used to truncation when e.g. storing a 20-byte string in a 10-byte dataset, but it's surprising when the source and destination are the same size. :) In any case, I certainly agree NumPy shouldn't be limited by the capabilities of HDF5. There are other valuable use cases, including access to the high-bit characters Latin-1 provides. But from a strict compatibility standpoint, ASCII would be beneficial. Andrew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion