lilei1128 commented on PR #16523: URL: https://github.com/apache/iceberg/pull/16523#issuecomment-4628958785
> Re-reviewed after the update — thanks for addressing the earlier comments. The redundant `useSnapshotSchema()` override is gone, the tests now enable distributed planning (so they actually exercise `BaseDistributedDataScan`), and v4.1 was added. Nice catch extending the fix to the executor side (`ReadDataManifest` / `ReadDeleteManifest`), since that's the same renamed-column bug on the remote path. > > Checking the other spec lookups for alignment, I found one spot that was missed and one nit: > > 1. (please fix) `SparkDistributedDataScan#doPlanDeletesRemotely` still builds the `DeleteFileIndex` with `.specsById(table().specs())`. During time travel that's the current-schema spec, which is inconsistent with (a) the executor read you just fixed in the same method and (b) the local path `BaseDistributedDataScan#planDeletesLocally`, which uses `.specsById(specs())`. It should be `specs()` too, in all three module copies (v3.5/v4.0 line 169, v4.1 line 171). Note this path only runs when the table has delete files, which the new test doesn't cover — so a small enhancement would be a deletes scenario, though that can be a follow-up. > 2. (nit) The v4.1 test uses `SparkReadOptions.VERSION_AS_OF` while v3.5/v4.0 use `SparkReadOptions.SNAPSHOT_ID`. Please keep the three copies identical. > > Otherwise the core `specCache()` -> `specs()` change and the executor-side changes look correct and aligned with `DataScan`/`DataTableScan`. The new test testTimeTravelFilterOnRenamedColumnWithDeleteFiles was added to cover the delete files scenario in distributed planning mode with time travel and column rename. After investigation, we confirmed that the specsById change in DeleteFileIndex.builderFor(deleteFiles) does not actually have a reproducible bug scenario. But, the test(testTimeTravelFilterOnRenamedColumnWithDeleteFiles) still provides value as a regression guard for the overall delete files + time travel + column rename + distributed planning scenario, even if it doesn't specifically target the specsById fix(the latest commit). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
