Parth Chandra created SPARK-47106:
-------------------------------------
Summary: Plan canonicalization test serializes/deserializes class
that is not serializable
Key: SPARK-47106
URL: https://issues.apache.org/jira/browse/SPARK-47106
Project: Spark
Issue Type: Test
Components: SQL
Affects Versions: 3.4.1, 3.4.0
Reporter: Parth Chandra
The test
{code:java}
test("SPARK-23731 plans should be canonicalizable after being
(de)serialized"){code}
serializes and deserializes
{code:java}
FileSourceScanExec{code}
which is not actually serializable. In particular, FileSourceScanExec.relation
is not serializable.
The test still passes though.
The test below derived from the above shows the issue -
{code:java}
test("verify FileSourceScanExec (de)serialize") {
withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
withTempPath { path =>
spark.range(1).write.parquet(path.getAbsolutePath)
val df = spark.read.parquet(path.getAbsolutePath)
val fileSourceScanExec =
df.queryExecution.sparkPlan.collectFirst { case p:
FileSourceScanExec => p }.get
val serializer = SparkEnv.get.serializer.newInstance()
val relation = serializer.serialize(fileSourceScanExec.relation)
assert(relation != null)
val deserialized =
serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec))
assert(deserialized.relation != null)
}
}
}{code}
The test fails with -
{code:java}
(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
- field (class:
org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location,
type: interface org.apache.spark.sql.execution.datasources.FileIndex)
- object (class
org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
java.io.NotSerializableException:
org.apache.spark.sql.execution.datasources.InMemoryFileIndex
Serialization stack:
- object not serializable (class:
org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value:
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1m0000gn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
- field (class:
org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location,
type: interface org.apache.spark.sql.execution.datasources.FileIndex)
- object (class
org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
at
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54)
at
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48)
at
org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
at
org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33)
at
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48)
at
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
at
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
at
org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32)
at
org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266)
at
org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264)
at
org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32)
at
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]