alamb commented on issue #1527: URL: https://github.com/apache/arrow-datafusion/issues/1527#issuecomment-1014610661
Thanks for the report @capkurmagati -- I am not sure if your usecase ever worked (in which case it is a bug). Regardless, as @tustvold mentions, we basically have the same usecase in IOx where some parquet files have a subset of the unified schema and we pad the remaining columns with NULLs. This picture might help https://github.com/influxdata/influxdb_iox/blob/f3f6f335a93d2910a5cc55e12662dfda82143701/query/src/provider/adapter.rs#L45-L72 We would be happy to contribute this to DataFusion / the file reader. @capkurmagati is there any chance you can write an end to end test (aka make the two parquet files you refer to above)? If so bringing in the `SchemaAdapter` stream would be pretty straightforward -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
