[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

GitBox Wed, 23 Nov 2022 23:08:36 -0800


trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1326039940


   @danny0405 @xiarixiaoyao I rework this PR. Could you pls take a look
   
   - Reverted all changes in COWInputFormat and MORInputForms
   - Added new tests with metafields query and count(*) query
   - Introduced new interface for parquet reader `HoodieParquetReader`
   - Implemented 2 readers: "reader as is" `HoodieParquetSplitReader` and 
"schema evolution reader" `HoodieParquetEvolvedSplitReader`
   
   Thus, we follow the approach proposed above:
   1) fetch the original schema when the file was committed, read the record as 
is
   2) project the record with latest read schema if needed
   
   Almost all schema evolution code is separated from inputFormat. The code is 
placed in `InternalSchemaManager` and  `CastMap`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][WIP][RFC-33] Flink engine support for comprehensive schema evolution

Reply via email to