asayers commented on PR #3292: URL: https://github.com/apache/arrow-rs/pull/3292#issuecomment-1344092731
> you can't (in general) round-trip an arrow table through a CSV file What I mean by this is that arrow is a more expressive format than CSV. For example, you might have an arrow array containing `Timestamp` values. When you convert to CSV they get stringified. Then, when you convert back to arrow, your `Timestamp` array has turned into a `String` array. Or, perhaps the CSV reader is clever and manages to guess that the strings in the CSV file are really timestamps! In that case, `Timestamp` arrays will indeed roundtrip correctly; but a `String` column containing values which _look_ like timestamps will fail to roundtrip (since it'll come back as a `Timestamp` column). I'm no arrow expert so what I'm saying might be nonsense! (eg. Maybe you guys are encoding the types in the CSV header or something?) Naively, though, I would think that the relationship between arrow tables and CSV files is many-to-one; in others words, CSV is a lossy encoding for arrow tables. Going in the other direction (roundtripping CSV files through arrow and back to CSV) should work perfectly, though. Again, please let me know if I'm way off on this! > we should be able to read the CSV files written and get the same data back The reader could look for things which look like hex-escapes, and convert the column to binary in that case, by parsing the escapes? AFAICT there's nothing for this in libstd though, so we'd have to write a parser. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
