alexeykudinkin commented on code in PR #7769:
URL: https://github.com/apache/hudi/pull/7769#discussion_r1091305177
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java:
##########
@@ -62,31 +71,29 @@ Schema getAvroSchema(Path sourceFilePath) throws
IOException {
}
@Override
- void executeBootstrap(HoodieBootstrapHandle<?, ?, ?, ?> bootstrapHandle,
- Path sourceFilePath, KeyGeneratorInterface
keyGenerator, String partitionPath, Schema avroSchema) throws Exception {
+ protected void executeBootstrap(HoodieBootstrapHandle<?, ?, ?, ?>
bootstrapHandle,
+ Path sourceFilePath,
+ KeyGeneratorInterface keyGenerator,
+ String partitionPath,
+ Schema schema) throws Exception {
BoundedInMemoryExecutor<HoodieRecord, HoodieRecord, Void> wrapper = null;
- HoodieFileReader reader =
HoodieFileReaderFactory.getReaderFactory(table.getConfig().getRecordMerger().getRecordType())
+ HoodieRecordMerger recordMerger = table.getConfig().getRecordMerger();
+
+ HoodieFileReader reader =
HoodieFileReaderFactory.getReaderFactory(recordMerger.getRecordType())
.getFileReader(table.getHadoopConf(), sourceFilePath);
try {
+ Function<HoodieRecord, HoodieRecord> transformer = record -> {
+ String recordKey = record.getRecordKey(schema,
Option.of(keyGenerator));
+ return createNewMetadataBootstrapRecord(recordKey, partitionPath,
recordMerger.getRecordType())
Review Comment:
Creating `createNewMetadataBootstrapRecord` is the crux of the change here:
- Now metadata bootstrap record is properly initialized with schema
including all of the meta-fields and not the one truncated to just record-key
(HoodieSparkRecord is not able to handle such truncated meta-fields schema)
- Avro path is restored to what it was before RFC-46
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/BaseBootstrapMetadataHandler.java:
##########
@@ -62,10 +61,10 @@ public BootstrapWriteStatus runMetadataBootstrap(String
srcPartitionPath, String
.map(HoodieAvroUtils::getRootLevelFieldName)
.collect(Collectors.toList());
Schema recordKeySchema =
HoodieAvroUtils.generateProjectionSchema(avroSchema, recordKeyColumns);
- LOG.info("Schema to be used for reading record Keys :" +
recordKeySchema);
Review Comment:
This are now properly set by actual FileReaders
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]