timvw opened a new issue, #14016:
URL: https://github.com/apache/datafusion/issues/14016

   ### Describe the bug
   
   Inference of ListingTableConfig does not work (anymore) for compressed json 
file
   
   With datafusion 35 and 36 the expected schema is inferred.
   With datafusion 37, 38 and 39 we see an error: ArrowError(JsonError("Failed 
to read JSON record: stream did not contain valid UTF-8"), None)
   With datafusion 40+ we error goes away, but no schema is inferred
   
   
   
   ### To Reproduce
   
   ```rust
   
       let ctx = SessionContext::new();
   
       // the file can be found here: 
https://github.com/timvw/arrow-testing/blob/master/data/json/ndjson-sample.json.gz
       let data_path = "/somewhere/testing/data/json/ndjson-sample.json.gz";
   
       let table_path = ListingTableUrl::parse(&data_path)?;
       let config = ListingTableConfig::new(table_path);
       let mut config_with_opts = config.infer_options(&ctx.state()).await?;
       let config_with_schema = 
config_with_opts.infer_schema(&ctx.state()).await?;
   
   ```
   
   ### Expected behavior
   
   The schema is inferred as in earlier versions
   
   ### Additional context
   
   Initial investigation shows that in ListingTableConfig infer_options method 
there is information loss:
   - file_extension is inferred to be "json" (instead of json.gz in the past) 
-> no files will be found in infer_schema
   - file_format is created without capturing the (potential) compression type 
-> trying to read the file (without codec) results in the error mentionned


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to