[ 
https://issues.apache.org/jira/browse/HADOOP-18670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Nagasubramanian updated HADOOP-18670:
-------------------------------------------
    Description: 
The issue I'm going to describe happens with the distribution: Spark 3.3.2 (git 
revision 5103e00c4c) built for Hadoop 3.3.2

Based on this, as per my understanding, from Hadoop v3, there shouldn't be any 
conflict between the Hadoop's and Spark app's dependencies. But, I see a 
runtime runtime failure with my spark app because of this conflict. Pasting the 
stack trace below:

{{Caused by: java.lang.NoSuchMethodError: 
com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;}}
{{    at org.apache.cassandra.config.Config.<init>(Config.java:102)}}
{{    at 
org.apache.cassandra.config.DatabaseDescriptor.clientInitialization(DatabaseDescriptor.java:288)}}
{{    at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.<clinit>(CQLSSTableWriter.java:109)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.init(GameRecommendationsSSTWriter.java:60)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.<init>(GameRecommendationsSSTWriter.java:23)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.execute(CassandraBulkLoad.java:93)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.main(CassandraBulkLoad.java:60)}}
{{    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
{{    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
{{    at java.lang.reflect.Method.invoke(Method.java:498)}}
{{    at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)}}

My Spark app has a transitive dependency (it depends on cassandra-all lib, 
which does on guava) on Guava library. The jar guava-14.0.1 that comes in 
spark-3.3.2-bin-hadoop3/jars is a decade old and doesn't have 
{{{}Sets.newConcurrentHashSet(){}}}. I'm able to run the spark app successfully 
by deleting that old version of guava jar from /jar directory and by including 
a recent version in my project's pom.xml.{{{}{}}}

  was:
The issue I'm going to describe happens with the distribution: Spark 3.3.2 (git 
revision 5103e00c4c) built for Hadoop 3.3.2

Based on [this|https://issues.apache.org/jira/browse/HADOOP-11804], as per my 
understanding, from Hadoop v3, there shouldn't be any conflict between the 
Hadoop's and Spark app's dependencies.

But my app has a transitive dependency (my spark app depends on cassandra-all 
lib, which does on guava) on Guava library. The jar guava-14.0.1 that comes in 
spark-3.3.2-bin-hadoop3/jars is a decade old and doesn't have a specific 
method. This results in runtime failure of my spark app. Pasting the stack 
trace below:

 

{{Caused by: java.lang.NoSuchMethodError: 
com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;}}
{{    at org.apache.cassandra.config.Config.<init>(Config.java:102)}}
{{    at 
org.apache.cassandra.config.DatabaseDescriptor.clientInitialization(DatabaseDescriptor.java:288)}}
{{    at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.<clinit>(CQLSSTableWriter.java:109)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.init(GameRecommendationsSSTWriter.java:60)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.<init>(GameRecommendationsSSTWriter.java:23)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.execute(CassandraBulkLoad.java:93)}}
{{    at 
com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.main(CassandraBulkLoad.java:60)}}
{{    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
{{    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
{{    at java.lang.reflect.Method.invoke(Method.java:498)}}
{{    at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)}}


> Spark application's dependency conflicts with Hadoop's dependency
> -----------------------------------------------------------------
>
>                 Key: HADOOP-18670
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18670
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.3.2
>            Reporter: Kiran Nagasubramanian
>            Priority: Blocker
>
> The issue I'm going to describe happens with the distribution: Spark 3.3.2 
> (git revision 5103e00c4c) built for Hadoop 3.3.2
> Based on this, as per my understanding, from Hadoop v3, there shouldn't be 
> any conflict between the Hadoop's and Spark app's dependencies. But, I see a 
> runtime runtime failure with my spark app because of this conflict. Pasting 
> the stack trace below:
> {{Caused by: java.lang.NoSuchMethodError: 
> com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;}}
> {{    at org.apache.cassandra.config.Config.<init>(Config.java:102)}}
> {{    at 
> org.apache.cassandra.config.DatabaseDescriptor.clientInitialization(DatabaseDescriptor.java:288)}}
> {{    at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.<clinit>(CQLSSTableWriter.java:109)}}
> {{    at 
> com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.init(GameRecommendationsSSTWriter.java:60)}}
> {{    at 
> com.<redacted>.spark.cassandra.bulkload.GameRecommendationsSSTWriter.<init>(GameRecommendationsSSTWriter.java:23)}}
> {{    at 
> com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.execute(CassandraBulkLoad.java:93)}}
> {{    at 
> com.<redacted>.spark.cassandra.bulkload.CassandraBulkLoad.main(CassandraBulkLoad.java:60)}}
> {{    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{    at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{    at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:740)}}
> My Spark app has a transitive dependency (it depends on cassandra-all lib, 
> which does on guava) on Guava library. The jar guava-14.0.1 that comes in 
> spark-3.3.2-bin-hadoop3/jars is a decade old and doesn't have 
> {{{}Sets.newConcurrentHashSet(){}}}. I'm able to run the spark app 
> successfully by deleting that old version of guava jar from /jar directory 
> and by including a recent version in my project's pom.xml.{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to