hudi-bot opened a new issue, #16265:
URL: https://github.com/apache/hudi/issues/16265
When hudi-aws-bundle.jar and hudi-spark3.3-bundle_2.12.jar are used at the
same time, and hudi-aws-bundle.jar is loaded first in the Spark runtime, it can
fails by NoSuchMethodError:
{noformat}
py4j.protocol.Py4JJavaError: An error occurred while calling ***.
: java.lang.NoSuchMethodError:
org.apache.hudi.avro.model.HoodieCleanMetadata.getTotalFilesDeleted()I
at
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:557
{noformat}
The problem is, currently hudi-aws-bundle jar in Maven central repo is built
against spark2 profile and Avro 1.8.2 is used to generate source code from Avro
schema file. Then, the generated source code is like
{noformat}
public Integer getTotalFilesDeleted() {
return this.totalFilesDeleted;
}
{noformat}
on the other hand, hudi-spark3.3-bundle_2.12.jar is built with Avro 1.11.1,
and the generated source code is like
{noformat}
public int getTotalFilesDeleted() {
return this.totalFilesDeleted;
}
{noformat}
Since Avro 1.9.0, it uses primitive type for generated getters/setters
(AVRO-2069). Therefore, if hudi-aws-bundle is loaded first in the runtime, it
can fail with the above NoSuchMethodError.
Although it can be fixed by changing the classpath loading order or building
hudi-aws-bundle by your own, is it possible to provide
hudi-aws-spark3.3-bundle.jar in Maven central? or, is it possible to build
hudi-aws-bundle jar using spark3 profile by default given most of AWS customer
now use Spark 3.x for their runtime?
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-6951
- Type: Improvement
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]