[
https://issues.apache.org/jira/browse/SPARK-34624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-34624.
-----------------------------------
Fix Version/s: 3.2.0
Resolution: Fixed
Issue resolved by pull request 31741
[https://github.com/apache/spark/pull/31741]
> Filter non-jar dependencies from ivy/maven coordinates
> ------------------------------------------------------
>
> Key: SPARK-34624
> URL: https://issues.apache.org/jira/browse/SPARK-34624
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.1.1
> Reporter: Shardul Mahadik
> Assignee: Shardul Mahadik
> Priority: Major
> Fix For: 3.2.0
>
>
> Some maven artifacts define non-jar dependencies. One such example is
> {{hive-exec}}'s dependency on the {{pom}} of {{apache-curator}}
> https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.8/hive-exec-2.3.8.pom
> Today trying to depend on such an artifact using {{--packages}} will print an
> error but continue without including the non-jar dependency.
> {code}
> 1/03/04 09:46:49 ERROR SparkContext: Failed to add
> file:/Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar
> to Spark environment
> java.io.FileNotFoundException: Jar
> /Users/shardul/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar not
> found
> at
> org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1935)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1990)
> at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:501)
> at
> org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:501)
> at
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> {code}
> Doing the same using {{spark.sql("ADD JAR
> ivy://org.apache.hive:hive-exec:2.3.8?exclude=org.pentaho:pentaho-aggdesigner-algorithm")}}
> will cause a failure
> {code}
> ADD JAR /Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar
> /Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar does
> not exist
> ======================
> END HIVE FAILURE OUTPUT
> ======================
> org.apache.spark.sql.execution.QueryExecutionException:
> /Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar does
> not exist
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:841)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:800)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:787)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:947)
> at
> org.apache.spark.sql.hive.HiveSessionResourceLoader.$anonfun$addJar$1(HiveSessionStateBuilder.scala:130)
> at
> org.apache.spark.sql.hive.HiveSessionResourceLoader.$anonfun$addJar$1$adapted(HiveSessionStateBuilder.scala:129)
> at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75)
> at
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:129)
> at
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
> at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3705)
> at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> at
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3703)
> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
> ... 47 elided
> {code}
> We should exclude these non-jar artifacts as our current dependency
> resolution code assume artifacts to be jars. e.g.
> https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1215
> and
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L318
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]