[GitHub] [arrow-datafusion] alamb commented on issue #1527: Error reading Parquet files after schema evolution

GitBox Mon, 17 Jan 2022 06:36:22 -0800


alamb commented on issue #1527:
URL: 
https://github.com/apache/arrow-datafusion/issues/1527#issuecomment-1014610661



   Thanks for the report @capkurmagati  -- I am not sure if your usecase ever 
worked (in which case it is a bug).
   
   Regardless, as @tustvold  mentions, we basically have the same usecase in 
IOx where some parquet files have a subset of the unified schema and we pad the 
remaining columns with NULLs. 
   
   This picture might help 
https://github.com/influxdata/influxdb_iox/blob/f3f6f335a93d2910a5cc55e12662dfda82143701/query/src/provider/adapter.rs#L45-L72
   
   We would be happy to contribute this to DataFusion / the file reader. 
@capkurmagati  is there any chance you can write an end to end test (aka make 
the two parquet files you refer to above)? If so bringing in the 
`SchemaAdapter` stream would be pretty straightforward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #1527: Error reading Parquet files after schema evolution

Reply via email to