duanyyyyyyy opened a new pull request, #8084:
URL: https://github.com/apache/paimon/pull/8084
DataEvolutionFileStoreScan.filterByStats(ManifestEntry) compared
readType.getFieldNames() against the file's physical column names. After a
RENAME COLUMN, the read side carries the latest name while the file's physical
name is the pre-rename one, so containsReadCol stays false and the entry is
silently dropped — even though the renamed field's id is preserved across
schemas and the underlying data is intact.
Compare by field id instead. Field id is the stable identity across schemas,
so RENAME and TYPE CHANGE no longer drop historical files. ADD COLUMN keeps its
current semantic, because a freshly-added field id is naturally absent from
pre-ALTER files.
The cache key is unchanged (Pair<schemaId, writeCols>); only the cached
value moves from List<String> to Set<Integer>. The comparison logic is
extracted into two @VisibleForTesting static helpers (computeFileFieldIds,
fileContainsAnyReadColumn) so the fix is directly unit-testable without
standing up a full scan.
Repro for the regression this fixes:
CREATE TABLE t (id INT, score INT) WITH
('data-evolution.enabled'='true');
INSERT INTO t VALUES (1, 10);
ALTER TABLE t RENAME COLUMN score TO grade;
INSERT INTO t VALUES (2, 100);
SELECT COUNT(*) FROM t WHERE grade < 60;
-- before: 0 (old file silently dropped at manifest pruning)
-- after: 1
### Purpose
### Tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]