Re: [PR] feat: Ensure MOR table works, with lance base files and avro logs file [hudi]

via GitHub Fri, 02 Jan 2026 08:09:18 -0800


rahil-c commented on code in PR #17768:
URL: https://github.com/apache/hudi/pull/17768#discussion_r2657974678



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -516,9 +516,6 @@ class HoodieSparkSqlWriterInternal {
             // scalastyle:on
 
             val writeConfig = client.getConfig
-            if (writeConfig.getRecordMerger.getRecordType == 
HoodieRecordType.SPARK && tableType == MERGE_ON_READ && 
writeConfig.getLogDataBlockFormat.orElse(HoodieLogBlockType.AVRO_DATA_BLOCK) != 
HoodieLogBlockType.PARQUET_DATA_BLOCK) {

Review Comment:
   Originally when I was running tests for MOR tables I ran into the following 
exception when the condition was present.
   ```
   java.lang.UnsupportedOperationException: 
org.apache.hudi.DefaultSparkRecordMerger only support parquet log.
   
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:513)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:193)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:211)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:133)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:171)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   ```
   
   My assumption for why this condition is in hudi is that we did not implement 
the `getAvroBytes` originally in the `HoodieSparkRecord` as i see the following 
comment here saying that only parquet log is supported for spark and not avro.
    
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L121
    
   
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/common/model/HoodieSparkRecord.java#L333
   
   Since this change now implements `getAvroBytes` I thought this restriction 
of spark merger only supporting parquet log block instead of avro log does not 
seem necessary anymore.
    
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Ensure MOR table works, with lance base files and avro logs file [hudi]

Reply via email to