felipepessoto opened a new issue, #12195:
URL: https://github.com/apache/gluten/issues/12195

   ### Description
   
   ## Description
   
   Gluten currently does not offload reads of Delta tables' **Change Data 
Feed** (`spark.read.format("delta").option("readChangeFeed", "true")...` or the 
`table_changes()` SQL function). These queries run entirely on vanilla Spark 
instead of the Velox backend.
   
   ## Why it falls back today
   
   A normal Delta scan is a `FileSourceScanExec` whose `relation.fileFormat` is 
a `DeltaParquetFileFormat`. Gluten's `OffloadDeltaScan` only matches that exact 
case and rewrites it into a `DeltaScanTransformer`:
   
   ```scala
   case scan: FileSourceScanExec
       if scan.relation.fileFormat.getClass == classOf[DeltaParquetFileFormat] 
=>
     DeltaScanTransformer(scan)
   ```
   
   CDF reads do **not** produce that plan. Delta builds them through 
`CDCReader.DeltaCDFRelation`, a generic `BaseRelation` whose `buildScan` 
returns RDD[Row]
   
   Because the resulting plan is not a `FileSourceScanExec` over 
`DeltaParquetFileFormat`, `OffloadDeltaScan` never matches it, so the entire 
query (scan + projections building the metadata columns) stays on vanilla Spark.
   
   ## Proposed work
   
   - Recognize the CDF scan path (`DeltaCDFRelation` / the CDC file indexes) 
and offload the underlying parquet reads to Velox.
   - Materialize the synthesized `_change_type` / `_commit_version` / 
`_commit_timestamp` columns (literals + projections) so they can be produced 
natively rather than forcing a fallback.
   - Add `gluten-ut` coverage for batch CDF reads (`readChangeFeed` and 
`table_changes()`), including add/remove/cdc-file combinations and column 
mapping.
   
   ### Gluten version
   
   main branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to