Re: [I] Do we need of write.payload.class in Reader job? [hudi]

via GitHub Fri, 24 Oct 2025 08:47:31 -0700


rangareddy commented on issue #14114:
URL: https://github.com/apache/hudi/issues/14114#issuecomment-3443807514


   Thanks @ad1happy2go for providing the answer. 
   
   Hi @bhavya-ganatra 
   
   Yes, this behavior is expected and required for **Merge-On-Read** tables in 
Apache Hudi.
   
   When you run a Spark SQL query against an MOR table (especially if you're 
reading in the Snapshot or Read Optimized mode on uncompacted files):
       *   The Spark Executors (workers) must scan the Base File and then apply 
the updates from the corresponding Log Files to reconstruct the latest state of 
each record.
       * The logic that decides how to apply the update record from the log 
file to the base record is defined entirely by the Payload Class (which you set 
with hoodie.datasource.write.payload.class).
   
   If the Spark Executor reading the data cannot find and instantiate 
com.xxx.xxxx.AppendableFieldsRecordMerger to run the combineAndGetUpdateValue 
logic, it cannot correctly merge the base and log data, leading to the 
ClassNotFoundException you see.
   
   To resolve this issue, You need to ensure that the JAR containing your 
custom payload class is available on the classpath of the Spark Executors (the 
distributed worker processes) when your reader job runs.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Do we need of write.payload.class in Reader job? [hudi]

Reply via email to