[GitHub] [arrow-rs] asayers commented on pull request #3292: csv: Support Binary-typed columns in the CSV writer

GitBox Sun, 11 Dec 2022 00:10:48 -0800


asayers commented on PR #3292:
URL: https://github.com/apache/arrow-rs/pull/3292#issuecomment-1345486526


   > reader is given a schema
   
   Aah, I see! You convert an arrow table into a CSV file + a schema, and then 
you can losslessly convert that _pair_ back into the original arrow table. Got 
it!
   
   And it sounds like you require that, if the writer succeeds in producing a 
CSV file, then the reader should produce the original arrow table when given 
the original schema (rather than, say, an error). Now I understand why you want 
some work on the reader side, to maintain this invariant. 
   
   > binary data will not be correctly escaped
   
   Oh! I assumed (but didn’t check) that this escaping was happening later in 
the writer code. 
   
   > the use-case of encoding binary data in a non-binary format
   
   I have a bunch of parquet files which happen to contain a few _mostly_-ascii 
columns. When making changes to the code which generates these files, I find it 
useful to eyeball the contents of the files. I do this by converting them to 
CSV. I’m sure there are better ways to do it…


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] asayers commented on pull request #3292: csv: Support Binary-typed columns in the CSV writer

Reply via email to