tustvold commented on PR #3292: URL: https://github.com/apache/arrow-rs/pull/3292#issuecomment-1344146616
> perhaps the CSV reader is clever The reader is given a schema with which to interpret the encoded values - see [here](https://docs.rs/arrow-csv/latest/arrow_csv/reader/struct.Reader.html#method.new). > manages to guess that the strings in the CSV file are really timestamps The CSV schema inference logic will do this, it has a load of regex expressions for this purpose > CSV is a lossy encoding for arrow tables In general the arrow implementations try very hard to not be lossy, it is actually a source of non-trivial pain, just google for "allow_truncated_timestamps". In general we should strive very hard to not be a lossy encoding. > The reader could look for things which look like hex-escapes Perhaps we could do what the python writer does and allow writing binary arrays so long as the content is valid UTF-8? Perhaps you could expand upon the use-case of encoding binary data in a non-binary format? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
