Akira Ajisaka created HUDI-6951:
-----------------------------------

             Summary: Use spark3 profile to build hudi-aws-bundle jars for 
release artifacts
                 Key: HUDI-6951
                 URL: https://issues.apache.org/jira/browse/HUDI-6951
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Akira Ajisaka


When hudi-aws-bundle.jar and hudi-spark3.3-bundle_2.12.jar are used at the same 
time, and hudi-aws-bundle.jar is loaded first in the Spark runtime, it can 
fails by NoSuchMethodError:
{noformat}
py4j.protocol.Py4JJavaError: An error occurred while calling ***.
: java.lang.NoSuchMethodError: 
org.apache.hudi.avro.model.HoodieCleanMetadata.getTotalFilesDeleted()I
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:557
{noformat}
The problem is, currently hudi-aws-bundle jar in Maven central repo is built 
against spark2 profile and Avro 1.8.2 is used to generate source code from Avro 
schema file. Then, the generated source code is like
{noformat}
    public Integer getTotalFilesDeleted() {
        return this.totalFilesDeleted;
    }
{noformat}
on the other hand, hudi-spark3.3-bundle_2.12.jar is built with Avro 1.11.1, and 
the generated source code is like
{noformat}
    public int getTotalFilesDeleted() {
        return this.totalFilesDeleted;
    }
{noformat}
Since Avro 1.9.0, it uses primitive type for generated getters/setters 
(AVRO-2069). Therefore, if hudi-aws-bundle is loaded first in the runtime, it 
can fail with the above NoSuchMethodError.

Although it can be fixed by changing the classpath loading order or building 
hudi-aws-bundle by your own, is it possible to provide 
hudi-aws-spark3.3-bundle.jar in Maven central? or, is it possible to build 
hudi-aws-bundle jar using spark3 profile by default given most of AWS customer 
now use Spark 3.x for their runtime?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to