Re: [PR] feat(schema): Migrate HoodieRecord methods to use HoodieSchema instead of Avro Schema. [hudi]

via GitHub Fri, 02 Jan 2026 23:14:39 -0800


voonhous commented on code in PR #17772:
URL: https://github.com/apache/hudi/pull/17772#discussion_r2658759456



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java:
##########
@@ -848,9 +848,9 @@ private Option<Pair<Function<HoodieRecord, HoodieRecord>, 
Schema>> composeEvolve
 
     return Option.of(Pair.of((record) -> {
       return record.rewriteRecordWithNewSchema(
-          dataBlock.getSchema().toAvroSchema(),
+          dataBlock.getSchema(),
           this.hoodieTableMetaClient.getTableConfig().getProps(),
-          mergedAvroSchema,
+          HoodieSchema.fromAvroSchema(mergedAvroSchema),

Review Comment:
   Line 847, `mergedAvroSchema` can be converted to `HoodieSchema`.



##########
hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java:
##########
@@ -430,7 +430,7 @@ public ByteArrayOutputStream 
serializeRecordsToLogBlock(HoodieStorage storage,
         //TODO boundary to revisit in follow up to use HoodieSchema directly

Review Comment:
   We can remove this comment now.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java:
##########
@@ -634,7 +634,7 @@ private void processDataBlock(HoodieDataBlock dataBlock, 
Option<KeySpec> keySpec
     try (ClosableIterator<HoodieRecord> recordIterator = 
recordsIteratorSchemaPair.getLeft()) {
       while (recordIterator.hasNext()) {
         HoodieRecord completedRecord = recordIterator.next()
-            
.wrapIntoHoodieRecordPayloadWithParams(recordsIteratorSchemaPair.getRight(),
+            
.wrapIntoHoodieRecordPayloadWithParams(HoodieSchema.fromAvroSchema(recordsIteratorSchemaPair.getRight()),

Review Comment:
   update `getRecordsIterator` to return `HoodieSchema` pair, so we do not need 
to invoke this `HoodieSchema#fromAvroSchema`



##########
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##########
@@ -726,7 +726,7 @@ private static List<HoodieRecord> 
getExpectedHoodieRecordsWithOrderingValue(List
     return expectedHoodieRecords.stream().map(rec -> {
       List<String> orderingFields = 
metaClient.getTableConfig().getOrderingFields();
       HoodieAvroIndexedRecord avroRecord = ((HoodieAvroIndexedRecord) rec);
-      Comparable orderingValue = OrderingValues.create(orderingFields, field 
-> (Comparable) avroRecord.getColumnValueAsJava(avroSchema, field, new 
TypedProperties()));
+      Comparable orderingValue = OrderingValues.create(orderingFields, field 
-> (Comparable) 
avroRecord.getColumnValueAsJava(HoodieSchema.fromAvroSchema(avroSchema), field, 
new TypedProperties()));

Review Comment:
   `#getExpectedHoodieRecordsWithOrderingValue`'s static function signature can 
be changed to accept `HoodieSchema`, which `validateOutputFromFileGroupReader` 
can be updated so that the `#toAvroSchema` / `#getAvroSchema` does not need to 
be used.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/execution/CopyOnWriteInsertHandler.java:
##########
@@ -82,7 +82,7 @@ public void consume(HoodieInsertValueGenResult<HoodieRecord> 
genResult) {
     String partitionPath = record.getPartitionPath();
     // just skip the ignored record，do not make partitions on fs
     try {
-      if (record.shouldIgnore(genResult.schema, config.getProps())) {
+      if (record.shouldIgnore(HoodieSchema.fromAvroSchema(genResult.schema), 
config.getProps())) {

Review Comment:
   Can `genResult.schema` return a `HoodieSchema` instead?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java:
##########
@@ -119,7 +119,7 @@ protected ByteArrayOutputStream 
serializeRecords(List<HoodieRecord> records, Hoo
         try {
           // Encode the record into bytes
           // Spark Record not support write avro log
-          ByteArrayOutputStream data = s.getAvroBytes(schema, props);
+          ByteArrayOutputStream data = 
s.getAvroBytes(HoodieSchema.fromAvroSchema(schema), props);

Review Comment:
   Let's use `HoodieSchemaCache#intern` on line 107 and use 
`HoodieSchema#parse` directly. So we do not need to do a 
`HoodieSchema#fromAvroSchema` call.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java:
##########
@@ -208,9 +208,9 @@ private Option<Function<HoodieRecord, HoodieRecord>> 
composeSchemaEvolutionTrans
         Map<String, String> renameCols = 
InternalSchemaUtils.collectRenameCols(writeInternalSchema, querySchema);
         return Option.of(record -> {
           return record.rewriteRecordWithNewSchema(
-              recordSchema.toAvroSchema(),
+              recordSchema,
               writeConfig.getProps(),
-              newWriterSchema, renameCols);
+              HoodieSchema.fromAvroSchema(newWriterSchema), renameCols);

Review Comment:
   `newWriterSchema` does not need to be wrapped with 
`HoodieSchema.fromAvroSchema`. Just remove the `getAvroSchema` call in line 203.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(schema): Migrate HoodieRecord methods to use HoodieSchema instead of Avro Schema. [hudi]

Reply via email to