trushev commented on PR #5830: URL: https://github.com/apache/hudi/pull/5830#issuecomment-1326039940
@danny0405 @xiarixiaoyao I rework this PR. Could you pls take a look - Reverted all changes in COWInputFormat and MORInputForms - Added new tests with metafields query and count(*) query - Introduced new interface for parquet reader `HoodieParquetReader` - Implemented 2 readers: "reader as is" `HoodieParquetSplitReader` and "schema evolution reader" `HoodieParquetEvolvedSplitReader` Thus, we follow the approach proposed above: 1) fetch the original schema when the file was committed, read the record as is 2) project the record with latest read schema if needed Almost all schema evolution code is separated from inputFormat. The code is placed in `InternalSchemaManager` and `CastMap`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
