bkietz commented on pull request #9685:
URL: https://github.com/apache/arrow/pull/9685#issuecomment-799485312
If we want to support detection of compression then that requires a fairly
significant change to this PR. As written, compression is a property of the
FileFormat, which is not mutated (even during discovery). Thus we couldn't look
at (for example) the `.gz` extension on provided file sources and switch from
"CSV" to "gzipped CSV". Compression-as-FileFormat-property paints us into a
corner WRT guessing compression.
Adding discovery of file formats would give us a place to put this
functionality, but that's a larger change and definitely out of scope here.
If guessing compression will ever be a priority, I'd recommend removing
compression-as-property and instead writing `Result<shared_ptr<InputStream>>
FileSource::OpenCompressed(optional<Compression::type> = {})` (without an
explicit compression type, it will guess what codec to use). This can replace
usage of `FileSource::Open` in `file_csv.cc:OpenReader`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]