adriangb commented on PR #3047:
URL: 
https://github.com/apache/datafusion-comet/pull/3047#issuecomment-3747603386

   > Hi @adriangb I went through proposed changes 
[apache/datafusion#19716](https://github.com/apache/datafusion/pull/19716) I 
still missing something. There is a use case for Comet native 
readers(non-iceberg), which backed by DF readers in
   > 
   > 
https://github.com/apache/datafusion-comet/blob/c4da3c91d672e6bc5d41959f6d60a671cd3dfd8a/native/core/src/parquet/parquet_exec.rs#L107
   > 
   > The schema adapter, used for schema conversion via `map_schema` and batch 
runtime modifications `map_batch`. `map_schema` looks straight forward however 
`map_batch` was a callback as part of parquet file opener 
https://github.com/apache/datafusion/blob/e571b49e0983892597a8f92e5d1502b17a15b180/datafusion/datasource-parquet/src/opener.rs#L465
   > 
   > In 
[apache/datafusion#19716](https://github.com/apache/datafusion/pull/19716) 
there is a proposed `BatchAdapter` however I can't find how this adapter would 
be called, what is the triggering mechanism. With `map_batch` the file opener 
was a caller and all the logic below worked with modified RB, I'm not sure how 
to achieve same with `BatchAdapter` though?
   
   This is covered upstream in DataFusion, the only changes needed here I 
already handled in this PR: 
https://github.com/apache/datafusion-comet/pull/3047/changes#diff-d573ace4d519dc7995a332cd91ffcaf553475dbd80db1cda8f5707c35f8e4c9b
   
   The whole point of the new API is that you get to define via expressions how 
you want to manipulate the schemas and batches (since expressions encapsulate 
both concepts of how to transform the data and how the schemas get 
transformed). Internally `ParquetOpener` does something similar to the 
`BatchaAdapter` being discussed in 
https://github.com/apache/datafusion/pull/19716:
   
   
https://github.com/apache/datafusion/blob/567ba75840494170cbe7e50c695110d447426c8c/datafusion/datasource-parquet/src/opener.rs#L401-L418
   
   
https://github.com/apache/datafusion/blob/567ba75840494170cbe7e50c695110d447426c8c/datafusion/datasource-parquet/src/opener.rs#L616-L622
   
   
https://github.com/apache/datafusion/blob/567ba75840494170cbe7e50c695110d447426c8c/datafusion/datasource-parquet/src/opener.rs#L631
   
   But you shouldn't have to worry about that, providing the custom 
`PhysicalExprAdapterFactory` like I do in this PR should be enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to