Parth Chandra created SPARK-47106:
-------------------------------------

             Summary: Plan canonicalization test serializes/deserializes class 
that is not serializable
                 Key: SPARK-47106
                 URL: https://issues.apache.org/jira/browse/SPARK-47106
             Project: Spark
          Issue Type: Test
          Components: SQL
    Affects Versions: 3.4.1, 3.4.0
            Reporter: Parth Chandra


The test
{code:java}
test("SPARK-23731 plans should be canonicalizable after being 
(de)serialized"){code}
serializes and deserializes 
{code:java}
FileSourceScanExec{code}
which is not actually serializable. In particular, FileSourceScanExec.relation 
is not serializable.
The test still passes though.

The test below derived from the above shows the issue - 
{code:java}
test("verify FileSourceScanExec (de)serialize") {
  withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
    withTempPath { path =>
      spark.range(1).write.parquet(path.getAbsolutePath)
      val df = spark.read.parquet(path.getAbsolutePath)
      val fileSourceScanExec =
        df.queryExecution.sparkPlan.collectFirst { case p:      
FileSourceScanExec => p }.get
      val serializer = SparkEnv.get.serializer.newInstance()
      val relation = serializer.serialize(fileSourceScanExec.relation)
      assert(relation != null)
      val deserialized =
      
serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec))
      assert(deserialized.relation != null)
    }
  }
}{code}
 
The test fails with -
{code:java}
(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
    - field (class: 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
type: interface org.apache.spark.sql.execution.datasources.FileIndex)
    - object (class 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
java.io.NotSerializableException: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex
Serialization stack:
    - object not serializable (class: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
    - field (class: 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
type: interface org.apache.spark.sql.execution.datasources.FileIndex)
    - object (class 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
    at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
    at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
    at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
    at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32)
    at 
org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266)
    at 
org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to