lilei1128 commented on PR #16523:
URL: https://github.com/apache/iceberg/pull/16523#issuecomment-4628958785

   > Re-reviewed after the update — thanks for addressing the earlier comments. 
The redundant `useSnapshotSchema()` override is gone, the tests now enable 
distributed planning (so they actually exercise `BaseDistributedDataScan`), and 
v4.1 was added. Nice catch extending the fix to the executor side 
(`ReadDataManifest` / `ReadDeleteManifest`), since that's the same 
renamed-column bug on the remote path.
   > 
   > Checking the other spec lookups for alignment, I found one spot that was 
missed and one nit:
   > 
   > 1. (please fix) `SparkDistributedDataScan#doPlanDeletesRemotely` still 
builds the `DeleteFileIndex` with `.specsById(table().specs())`. During time 
travel that's the current-schema spec, which is inconsistent with (a) the 
executor read you just fixed in the same method and (b) the local path 
`BaseDistributedDataScan#planDeletesLocally`, which uses `.specsById(specs())`. 
It should be `specs()` too, in all three module copies (v3.5/v4.0 line 169, 
v4.1 line 171). Note this path only runs when the table has delete files, which 
the new test doesn't cover — so a small enhancement would be a deletes 
scenario, though that can be a follow-up.
   > 2. (nit) The v4.1 test uses `SparkReadOptions.VERSION_AS_OF` while 
v3.5/v4.0 use `SparkReadOptions.SNAPSHOT_ID`. Please keep the three copies 
identical.
   > 
   > Otherwise the core `specCache()` -> `specs()` change and the executor-side 
changes look correct and aligned with `DataScan`/`DataTableScan`.
   
   The new test testTimeTravelFilterOnRenamedColumnWithDeleteFiles was added to 
cover the delete files scenario in distributed planning mode with time travel 
and column rename.
   After investigation, we confirmed that the specsById change in 
DeleteFileIndex.builderFor(deleteFiles) does not actually have a reproducible 
bug scenario.
   
   But, the test(testTimeTravelFilterOnRenamedColumnWithDeleteFiles) still 
provides value as a regression guard for the overall delete files + time travel 
+ column rename + distributed planning scenario, even if it doesn't 
specifically target the specsById fix(the latest commit).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to