[ 
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886237#comment-16886237
 ] 

Michael Heuer edited comment on SPARK-27781 at 7/16/19 4:19 PM:
----------------------------------------------------------------

I believe I saw a fix for this specific issue, where the avro jars are now 
added to the Spark binary distribution without Hadoop.  Will look for the pull 
request.

I cannot let Spark off the hook as easily as you suggest though – Spark is the 
project that brings these dependencies together, as compile time dependencies 
and on the runtime classpath.  Spark needs to ensure those dependencies are 
compatible with each other.


was (Author: heuermh):
I believe I saw a fix for this specific issue, where the avro jars are now 
added to the Spark binary distribution with out Hadoop.  Will look for the pull 
request.

I cannot let Spark off the hook as easily as you suggest though – Spark is the 
project that brings these dependencies together, as compile time dependencies 
and on the runtime classpath.  Spark needs to ensure those dependencies are 
compatible with each other.

> Tried to access method org.apache.avro.specific.SpecificData.<init>()V
> ----------------------------------------------------------------------
>
>                 Key: SPARK-27781
>                 URL: https://issues.apache.org/jira/browse/SPARK-27781
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.3
>            Reporter: Michael Heuer
>            Priority: Major
>         Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime 
> when using Spark 2.4.3 and Scala 2.12 
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 
> 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes 
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop 
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which 
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to