tustvold commented on PR #3292:
URL: https://github.com/apache/arrow-rs/pull/3292#issuecomment-1344146616

   > perhaps the CSV reader is clever
   
   The reader is given a schema with which to interpret the encoded values - 
see 
[here](https://docs.rs/arrow-csv/latest/arrow_csv/reader/struct.Reader.html#method.new).
   
   > manages to guess that the strings in the CSV file are really timestamps
   
   The CSV schema inference logic will do this, it has a load of regex 
expressions for this purpose
   
   > CSV is a lossy encoding for arrow tables
   
   In general the arrow implementations try very hard to not be lossy, it is 
actually a source of non-trivial pain, just google for 
"allow_truncated_timestamps". In general we should strive very hard to not be a 
lossy encoding.
   
   > The reader could look for things which look like hex-escapes
   
   Perhaps we could do what the python writer does and allow writing binary 
arrays so long as the content is valid UTF-8? Perhaps you could expand upon the 
use-case of encoding binary data in a non-binary format?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to