[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

nirav patel (Jira) Thu, 29 Feb 2024 13:02:04 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


nirav patel updated SPARK-46762:
--------------------------------
    Description: 
We are having following `java.lang.ClassCastException` error in spark Executors 
when using spark-connect 3.5 with external spark sql catalog jar - 
iceberg-spark-runtime-3.5_2.12-1.4.3.jar

We also set "spark.executor.userClassPathFirst=true" otherwise child class gets 
loaded by MutableClassLoader and parent class gets loaded by 
ChildFirstCLassLoader and that causes ClassCastException as well.

 
{code:java}
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
java.lang.ClassCastException: class 
org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
class org.apache.iceberg.Table 
(org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module 
of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
org.apache.iceberg.Table is in unnamed module of loader 
org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
    at 
org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
    at 
org.apache.iceberg.spark.source.RowDataReader.<init>(RowDataReader.java:50)
    at 
org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
    at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
    at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
    at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
    at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apach...{code}
 

`org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
`org.apache.iceberg.Table` and they are both in only one jar  
`iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 

We verified that there's only one jar of 
`iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server is 
started. 

Looking more into Error it seems classloader itself is instantiated multiple 
times somewhere. I can see two instances: 
org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 

 

*Affected version:*

spark 3.5 and spark-connect_2.12:3.5.0 works fine

 

*Not affected version and variation:*

Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar

Also works with just Spark 3.5 spark-submit script directly (ie without using 
spark-connect 3.5 )

 

Issue has been open with Iceberg as well: 
[https://github.com/apache/iceberg/issues/8978]

And been discussed in dev@org.apache.iceberg: 
[https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]

 

 

Steps to reproduce:

 

1) TO see that spark is loading same class twice using different classloader:

 

Start spark-connect server with required jars and configuration for 
iceberg-hive catalog. 
{code:java}
sudo /usr/lib/spark/sbin/start-connect-server.sh \
 --packages org.apache.spark:spark-connect_2.12:3.4.0 \
 --jars 
gs://strivr-dev-test-dataproc-libs/iceberg-libs/iceberg-spark-runtime-3.4_2.12-1.3.1.jar
 \
 --conf "spark.executor.extraJavaOptions=-verbose:class" \
 --conf 
"spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog" \
 --conf "spark.sql.catalog.iceberg_catalog.type=hive" \
 --conf "spark.sql.catalog.iceberg_catalog.uri = thrift://metastore-host:port 
{code}
reference: [https://iceberg.apache.org/docs/1.4.2/spark-configuration/#catalogs]

Since i Have `"spark.executor.extraJavaOptions=-verbose:class"` you should see 
in executor logs that 

 

2) To actually reproduce ClassCastException

  was:
We are having following `java.lang.ClassCastException` error in spark Executors 
when using spark-connect 3.5 with external spark sql catalog jar - 
iceberg-spark-runtime-3.5_2.12-1.4.3.jar

We also set "spark.executor.userClassPathFirst=true" otherwise child class gets 
loaded by MutableClassLoader and parent class gets loaded by 
ChildFirstCLassLoader and that causes ClassCastException as well.

 
{code:java}
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
java.lang.ClassCastException: class 
org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
class org.apache.iceberg.Table 
(org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module 
of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
org.apache.iceberg.Table is in unnamed module of loader 
org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
    at 
org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
    at 
org.apache.iceberg.spark.source.RowDataReader.<init>(RowDataReader.java:50)
    at 
org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
    at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
    at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
    at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
    at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apach...{code}
 

`org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
`org.apache.iceberg.Table` and they are both in only one jar  
`iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 

We verified that there's only one jar of 
`iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server is 
started. 

Looking more into Error it seems classloader itself is instantiated multiple 
times somewhere. I can see two instances: 
org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 

 

*Affected version:*

spark 3.5 and spark-connect_2.12:3.5.0 works fine

 

*Not affected version and variation:*

Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar

Also works with just Spark 3.5 spark-submit script directly (ie without using 
spark-connect 3.5 )

 

Issue has been open with Iceberg as well: 
[https://github.com/apache/iceberg/issues/8978]

And been discussed in dev@org.apache.iceberg: 
[https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]

 


> Spark Connect 3.5 Classloading issue with external jar
> ------------------------------------------------------
>
>                 Key: SPARK-46762
>                 URL: https://issues.apache.org/jira/browse/SPARK-46762
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 3.5.0
>            Reporter: nirav patel
>            Priority: Major
>         Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot 
> 2024-02-22 at 2.04.49 PM.png
>
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.<init>(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>     at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>     at org.apache.spark.scheduler.Task.run(Task.scala:141)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>     at org.apach...{code}
>  
> `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
> `org.apache.iceberg.Table` and they are both in only one jar  
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 
> We verified that there's only one jar of 
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server 
> is started. 
> Looking more into Error it seems classloader itself is instantiated multiple 
> times somewhere. I can see two instances: 
> org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 
>  
> *Affected version:*
> spark 3.5 and spark-connect_2.12:3.5.0 works fine
>  
> *Not affected version and variation:*
> Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar
> Also works with just Spark 3.5 spark-submit script directly (ie without using 
> spark-connect 3.5 )
>  
> Issue has been open with Iceberg as well: 
> [https://github.com/apache/iceberg/issues/8978]
> And been discussed in dev@org.apache.iceberg: 
> [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]
>  
>  
> Steps to reproduce:
>  
> 1) TO see that spark is loading same class twice using different classloader:
>  
> Start spark-connect server with required jars and configuration for 
> iceberg-hive catalog. 
> {code:java}
> sudo /usr/lib/spark/sbin/start-connect-server.sh \
>  --packages org.apache.spark:spark-connect_2.12:3.4.0 \
>  --jars 
> gs://strivr-dev-test-dataproc-libs/iceberg-libs/iceberg-spark-runtime-3.4_2.12-1.3.1.jar
>  \
>  --conf "spark.executor.extraJavaOptions=-verbose:class" \
>  --conf 
> "spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog" \
>  --conf "spark.sql.catalog.iceberg_catalog.type=hive" \
>  --conf "spark.sql.catalog.iceberg_catalog.uri = thrift://metastore-host:port 
> {code}
> reference: 
> [https://iceberg.apache.org/docs/1.4.2/spark-configuration/#catalogs]
> Since i Have `"spark.executor.extraJavaOptions=-verbose:class"` you should 
> see in executor logs that 
>  
> 2) To actually reproduce ClassCastException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

Reply via email to