[GitHub] [flink-table-store] zjureel opened a new pull request, #376: [FLINK-27843] Schema evolution for data file meta

GitBox Sun, 13 Nov 2022 16:52:10 -0800


zjureel opened a new pull request, #376:
URL: https://github.com/apache/flink-table-store/pull/376


   Currently, the table store uses the latest schema id to read the data file 
meta. When the schema evolves, it will cause errors, for example:
   1. the schema of underlying data is [1->a, 2->b, 3->c, 4->d] and schema id 
is 0, where 1/2/3/4 is field id and a/b/c/d is field name
   2. After schema evolution, schema id is 1, and the new schema is [1->a, 
3->c, 5->f, 6->b, 7->g]
   When table store reads the field stats from data file meta, it should 
mapping schema 1 to 0 according to their field ids.
   
   This PR will read and parse the data according to the schema id in the meta 
file when reading the data file meta, and create index mapping from the table 
schema and the meta schema, so that the table store can read the correct file 
meta data through its latest schema.
   
   The main codes are as follows:
   1. Added `SchemaFieldTypeExtractor` to extract key fields for 
`ChangelogValueCountFileStoreTable` and `ChangelogWithKeyFileStoreTable`
   2. Added `SchemaEvolutionUtil` to create index mapping from table schema to 
meta file schema
   3. Updated `FieldStatsArraySerializer` to read field stats with given index 
mapping
   
   The main tests include:
   1. Added `SchemaEvolutionUtilTest` to create index mapping between two 
schemas.
   2. Added `FieldStatsArraSerializerTest` to read meta from table schema
   3. Added `AppendOnlyTableFileMetaFilterTest`, 
`ChangelogValueCountFileMetaFilterTest` and 
`ChangelogWithKeyFileMetaFilterTest` to filter old field, new field, partition 
field and primary key in data file meta in table scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-table-store] zjureel opened a new pull request, #376: [FLINK-27843] Schema evolution for data file meta

Reply via email to