[GitHub] [arrow-rs] asayers commented on pull request #3292: csv: Support Binary-typed columns in the CSV writer

GitBox Fri, 09 Dec 2022 01:56:23 -0800


asayers commented on PR #3292:
URL: https://github.com/apache/arrow-rs/pull/3292#issuecomment-1344092731


   > you can't (in general) round-trip an arrow table through a CSV file
   
   What I mean by this is that arrow is a more expressive format than CSV.
   
   For example, you might have an arrow array containing `Timestamp` values.  
When you convert to CSV they get stringified.  Then, when you convert back to 
arrow, your `Timestamp` array has turned into a `String` array.  Or, perhaps 
the CSV reader is clever and manages to guess that the strings in the CSV file 
are really timestamps!  In that case, `Timestamp` arrays will indeed roundtrip 
correctly; but a `String` column containing values which _look_ like timestamps 
will fail to roundtrip (since it'll come back as a `Timestamp` column).
   
   I'm no arrow expert so what I'm saying might be nonsense!  (eg. Maybe you 
guys are encoding the types in the CSV header or something?)  Naively, though, 
I would think that the relationship between arrow tables and CSV files is 
many-to-one; in others words, CSV is a lossy encoding for arrow tables.  Going 
in the other direction (roundtripping CSV files through arrow and back to CSV) 
should work perfectly, though.
   
   Again, please let me know if I'm way off on this!
   
   > we should be able to read the CSV files written and get the same data back
   
   The reader could look for things which look like hex-escapes, and convert 
the column to binary in that case, by parsing the escapes?  AFAICT there's 
nothing for this in libstd though, so we'd have to write a parser.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] asayers commented on pull request #3292: csv: Support Binary-typed columns in the CSV writer

Reply via email to