Re: [Numpy-discussion] proposal: smaller representation of string arrays

Robert Kern Mon, 24 Apr 2017 11:48:17 -0700

On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
[email protected]> wrote:
>
> On Mon, Apr 24, 2017 at 1:04 PM, Chris Barker <[email protected]>
wrote:


>> - round-tripping of binary data (at least with Python's
encoding/decoding) -- ANY string of bytes can be decodes as latin-1 and
re-encoded to get the same bytes back. You may get garbage, but you won't
get an EncodingError.
>
> +1.  The key point is that there is a HUGE amount of legacy science data
in the form of FITS (astronomy-specific binary file format that has been
the primary file format for 20+ years) and HDF5 which uses a character data
type to store data which can be bytes 0-255.  Getting an decoding/encoding
error when trying to deal with these datasets is a non-starter from my
perspective.

That says to me that these are properly represented by `bytes` objects, not
`unicode/str` objects encoding to and decoding from a hardcoded latin-1
encoding.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Reply via email to