sezruby commented on PR #12240: URL: https://github.com/apache/gluten/pull/12240#issuecomment-4628227225
@zhztheplayer @FelixYBW could you take a look when you get a chance? Quick context: this fixes the wrong-result bug in #10511 by narrowing the column-mapping rewrite — only the reader-facing fields (`output`, `dataSchema`, data part of `requiredSchema`) become physical; partition schema and filters stay logical so Delta's `PreparedDeltaFileIndex` keeps working for partition pruning + stats-based file skipping. Native side gets a physical-translated filter copy via a `scanFilters` override. The result is asymmetric on the scan node, which I know is a bit ugly. The reason is that vanilla Spark + Delta does the logical→physical translation just-in-time inside `DeltaParquetFileFormat.buildReaderWithPartitionValues`, and Gluten bypasses that hook. The cleaner shape is to keep EVERYTHING on the scan node logical and translate only at substrait emission (`BasicScanExecTransformer.doTransform`) — that would also let us drop the alias-back `ProjectExecTransformer` and the `scanFilters` override. But that's a multi-module refactor (touches the substrait emitter shared across Iceberg/Hudi/plain Parquet/Delta), so I left it as a follow-up noted in the docstring rather than scope-creep into a bug fix. Happy to take it on as a separate PR if you'd prefer. Verified locally end-to-end with the prebuilt CI artifacts in `apache/gluten:centos-8-jdk8` — `VeloxDeltaSuite` passes 30/30 including all 12 new tests. CI is also green on the latest commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
