duanyyyyyyy opened a new pull request, #8084:
URL: https://github.com/apache/paimon/pull/8084

   DataEvolutionFileStoreScan.filterByStats(ManifestEntry) compared 
readType.getFieldNames() against the file's physical column names. After a 
RENAME COLUMN, the read side carries the latest name while the file's physical 
name is the pre-rename one, so containsReadCol stays false and the entry is 
silently dropped — even though the renamed field's id is preserved across 
schemas and the underlying data is intact.
   
   Compare by field id instead. Field id is the stable identity across schemas, 
so RENAME and TYPE CHANGE no longer drop historical files. ADD COLUMN keeps its 
current semantic, because a freshly-added field id is naturally absent from 
pre-ALTER files.
   
   The cache key is unchanged (Pair<schemaId, writeCols>); only the cached 
value moves from List<String> to Set<Integer>. The comparison logic is 
extracted into two @VisibleForTesting static helpers (computeFileFieldIds, 
fileContainsAnyReadColumn) so the fix is directly unit-testable without 
standing up a full scan.
   
   Repro for the regression this fixes:
   
       CREATE TABLE t (id INT, score INT) WITH 
('data-evolution.enabled'='true');
       INSERT INTO t VALUES (1, 10);
       ALTER TABLE t RENAME COLUMN score TO grade;
       INSERT INTO t VALUES (2, 100);
       SELECT COUNT(*) FROM t WHERE grade < 60;
       -- before: 0   (old file silently dropped at manifest pruning)
       -- after:  1
   
   ### Purpose
   
   ### Tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to