Akira Ajisaka created HUDI-6951:
-----------------------------------
Summary: Use spark3 profile to build hudi-aws-bundle jars for
release artifacts
Key: HUDI-6951
URL: https://issues.apache.org/jira/browse/HUDI-6951
Project: Apache Hudi
Issue Type: Improvement
Reporter: Akira Ajisaka
When hudi-aws-bundle.jar and hudi-spark3.3-bundle_2.12.jar are used at the same
time, and hudi-aws-bundle.jar is loaded first in the Spark runtime, it can
fails by NoSuchMethodError:
{noformat}
py4j.protocol.Py4JJavaError: An error occurred while calling ***.
: java.lang.NoSuchMethodError:
org.apache.hudi.avro.model.HoodieCleanMetadata.getTotalFilesDeleted()I
at
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:557
{noformat}
The problem is, currently hudi-aws-bundle jar in Maven central repo is built
against spark2 profile and Avro 1.8.2 is used to generate source code from Avro
schema file. Then, the generated source code is like
{noformat}
public Integer getTotalFilesDeleted() {
return this.totalFilesDeleted;
}
{noformat}
on the other hand, hudi-spark3.3-bundle_2.12.jar is built with Avro 1.11.1, and
the generated source code is like
{noformat}
public int getTotalFilesDeleted() {
return this.totalFilesDeleted;
}
{noformat}
Since Avro 1.9.0, it uses primitive type for generated getters/setters
(AVRO-2069). Therefore, if hudi-aws-bundle is loaded first in the runtime, it
can fail with the above NoSuchMethodError.
Although it can be fixed by changing the classpath loading order or building
hudi-aws-bundle by your own, is it possible to provide
hudi-aws-spark3.3-bundle.jar in Maven central? or, is it possible to build
hudi-aws-bundle jar using spark3 profile by default given most of AWS customer
now use Spark 3.x for their runtime?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)