[
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886237#comment-16886237
]
Michael Heuer edited comment on SPARK-27781 at 7/16/19 4:19 PM:
----------------------------------------------------------------
I believe I saw a fix for this specific issue, where the avro jars are now
added to the Spark binary distribution without Hadoop. Will look for the pull
request.
I cannot let Spark off the hook as easily as you suggest though – Spark is the
project that brings these dependencies together, as compile time dependencies
and on the runtime classpath. Spark needs to ensure those dependencies are
compatible with each other.
was (Author: heuermh):
I believe I saw a fix for this specific issue, where the avro jars are now
added to the Spark binary distribution with out Hadoop. Will look for the pull
request.
I cannot let Spark off the hook as easily as you suggest though – Spark is the
project that brings these dependencies together, as compile time dependencies
and on the runtime classpath. Spark needs to ensure those dependencies are
compatible with each other.
> Tried to access method org.apache.avro.specific.SpecificData.<init>()V
> ----------------------------------------------------------------------
>
> Key: SPARK-27781
> URL: https://issues.apache.org/jira/browse/SPARK-27781
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.3
> Reporter: Michael Heuer
> Priority: Major
> Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime
> when using Spark 2.4.3 and Scala 2.12
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop
> 2.7.7.
>
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>
> Attached a smaller reproducing test case.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]