rangareddy commented on issue #14114:
URL: https://github.com/apache/hudi/issues/14114#issuecomment-3443807514
Thanks @ad1happy2go for providing the answer.
Hi @bhavya-ganatra
Yes, this behavior is expected and required for **Merge-On-Read** tables in
Apache Hudi.
When you run a Spark SQL query against an MOR table (especially if you're
reading in the Snapshot or Read Optimized mode on uncompacted files):
* The Spark Executors (workers) must scan the Base File and then apply
the updates from the corresponding Log Files to reconstruct the latest state of
each record.
* The logic that decides how to apply the update record from the log
file to the base record is defined entirely by the Payload Class (which you set
with hoodie.datasource.write.payload.class).
If the Spark Executor reading the data cannot find and instantiate
com.xxx.xxxx.AppendableFieldsRecordMerger to run the combineAndGetUpdateValue
logic, it cannot correctly merge the base and log data, leading to the
ClassNotFoundException you see.
To resolve this issue, You need to ensure that the JAR containing your
custom payload class is available on the classpath of the Spark Executors (the
distributed worker processes) when your reader job runs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]