hudi-bot opened a new issue, #16265:
URL: https://github.com/apache/hudi/issues/16265

   When hudi-aws-bundle.jar and hudi-spark3.3-bundle_2.12.jar are used at the 
same time, and hudi-aws-bundle.jar is loaded first in the Spark runtime, it can 
fails by NoSuchMethodError:
   {noformat}
   py4j.protocol.Py4JJavaError: An error occurred while calling ***.
   : java.lang.NoSuchMethodError: 
org.apache.hudi.avro.model.HoodieCleanMetadata.getTotalFilesDeleted()I
   at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:557
   {noformat}
   The problem is, currently hudi-aws-bundle jar in Maven central repo is built 
against spark2 profile and Avro 1.8.2 is used to generate source code from Avro 
schema file. Then, the generated source code is like
   {noformat}
       public Integer getTotalFilesDeleted() {
           return this.totalFilesDeleted;
       }
   {noformat}
   on the other hand, hudi-spark3.3-bundle_2.12.jar is built with Avro 1.11.1, 
and the generated source code is like
   {noformat}
       public int getTotalFilesDeleted() {
           return this.totalFilesDeleted;
       }
   {noformat}
   Since Avro 1.9.0, it uses primitive type for generated getters/setters 
(AVRO-2069). Therefore, if hudi-aws-bundle is loaded first in the runtime, it 
can fail with the above NoSuchMethodError.
   
   Although it can be fixed by changing the classpath loading order or building 
hudi-aws-bundle by your own, is it possible to provide 
hudi-aws-spark3.3-bundle.jar in Maven central? or, is it possible to build 
hudi-aws-bundle jar using spark3 profile by default given most of AWS customer 
now use Spark 3.x for their runtime?
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6951
   - Type: Improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to