adriangb commented on PR #3047: URL: https://github.com/apache/datafusion-comet/pull/3047#issuecomment-3747603386
> Hi @adriangb I went through proposed changes [apache/datafusion#19716](https://github.com/apache/datafusion/pull/19716) I still missing something. There is a use case for Comet native readers(non-iceberg), which backed by DF readers in > > https://github.com/apache/datafusion-comet/blob/c4da3c91d672e6bc5d41959f6d60a671cd3dfd8a/native/core/src/parquet/parquet_exec.rs#L107 > > The schema adapter, used for schema conversion via `map_schema` and batch runtime modifications `map_batch`. `map_schema` looks straight forward however `map_batch` was a callback as part of parquet file opener https://github.com/apache/datafusion/blob/e571b49e0983892597a8f92e5d1502b17a15b180/datafusion/datasource-parquet/src/opener.rs#L465 > > In [apache/datafusion#19716](https://github.com/apache/datafusion/pull/19716) there is a proposed `BatchAdapter` however I can't find how this adapter would be called, what is the triggering mechanism. With `map_batch` the file opener was a caller and all the logic below worked with modified RB, I'm not sure how to achieve same with `BatchAdapter` though? This is covered upstream in DataFusion, the only changes needed here I already handled in this PR: https://github.com/apache/datafusion-comet/pull/3047/changes#diff-d573ace4d519dc7995a332cd91ffcaf553475dbd80db1cda8f5707c35f8e4c9b The whole point of the new API is that you get to define via expressions how you want to manipulate the schemas and batches (since expressions encapsulate both concepts of how to transform the data and how the schemas get transformed). Internally `ParquetOpener` does something similar to the `BatchaAdapter` being discussed in https://github.com/apache/datafusion/pull/19716: https://github.com/apache/datafusion/blob/567ba75840494170cbe7e50c695110d447426c8c/datafusion/datasource-parquet/src/opener.rs#L401-L418 https://github.com/apache/datafusion/blob/567ba75840494170cbe7e50c695110d447426c8c/datafusion/datasource-parquet/src/opener.rs#L616-L622 https://github.com/apache/datafusion/blob/567ba75840494170cbe7e50c695110d447426c8c/datafusion/datasource-parquet/src/opener.rs#L631 But you shouldn't have to worry about that, providing the custom `PhysicalExprAdapterFactory` like I do in this PR should be enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
