Jefffrey commented on issue #5657:
URL: 
https://github.com/apache/arrow-datafusion/issues/5657#issuecomment-1491561571

   Specific issue seems to be in this function:
   
   
https://github.com/apache/arrow-datafusion/blob/667f19ebad216b7592af5a91b70a24fb21c3bb64/datafusion/core/src/datasource/listing/table.rs#L431-L444
   
   Because the file extension is `.csv.bz2` and not just `.csv` it doesn't list 
the file hence leading to inferring schema from an empty list of files, leading 
to empty schema.
   
   As a temporary workaround I renamed the file from `summary.csv.bz2` to 
`summary.csv` and this seemed to be picked up properly, however it ran into 
another issue:
   
   `Error: ArrowError(CsvError("decompression not finished but EOF reached"))`
   
   This specifically stems from here:
   
   
https://github.com/apache/arrow-datafusion/blob/667f19ebad216b7592af5a91b70a24fb21c3bb64/datafusion/core/src/datasource/file_format/csv.rs#L208-L215
   
   Haven't looked into it too much, but seems similar to these issues:
   
   - https://github.com/apache/arrow-datafusion/issues/1736
   - https://github.com/apache/arrow-datafusion/issues/5041


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to