This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 7546b4405d5 [SPARK-42164][CORE] Register partitioned-table-related
classes to KryoSerializer
7546b4405d5 is described below
commit 7546b4405d5b35626e98b28bfc8031d2100172d1
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon Jan 23 21:39:30 2023 -0800
[SPARK-42164][CORE] Register partitioned-table-related classes to
KryoSerializer
### What changes were proposed in this pull request?
This PR aims to register partitioned-table-related classes to
`KryoSerializer`.
Specifically, `CREATE TABLE` and `MSCK REPAIR TABLE` uses this classes.
### Why are the changes needed?
To support partitioned-tables more easily with `KryoSerializer`.
Previously, it fails like the following.
```
java.lang.IllegalArgumentException: Class is not registered:
org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation
```
```
java.lang.IllegalArgumentException: Class is not registered:
org.apache.spark.util.HadoopFSUtils$SerializableFileStatus
```
```
java.lang.IllegalArgumentException: Class is not registered:
org.apache.spark.sql.execution.command.PartitionStatistics
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs and manually tests.
**TEST TABLE**
```
$ tree /tmp/t
/tmp/t
├── p=1
│ └── users.orc
├── p=10
│ └── users.orc
├── p=11
│ └── users.orc
├── p=2
│ └── users.orc
├── p=3
│ └── users.orc
├── p=4
│ └── users.orc
├── p=5
│ └── users.orc
├── p=6
│ └── users.orc
├── p=7
│ └── users.orc
├── p=8
│ └── users.orc
└── p=9
└── users.orc
```
**CREATE PARTITIONED TABLES AND RECOVER PARTITIONS**
```
$ bin/spark-shell -c spark.kryo.registrationRequired=true -c
spark.serializer=org.apache.spark.serializer.KryoSerializer -c
spark.sql.sources.parallelPartitionDiscovery.threshold=1
scala> sql("CREATE TABLE t USING ORC LOCATION '/tmp/t'").show()
++
||
++
++
scala> sql("MSCK REPAIR TABLE t").show()
++
||
++
++
```
Closes #39713 from dongjoon-hyun/SPARK-42164.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala | 5 +++++
1 file changed, 5 insertions(+)
diff --git
a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
index b7035feba84..5499732660b 100644
--- a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
@@ -510,6 +510,10 @@ private[serializer] object KryoSerializer {
// SQL / ML / MLlib classes once and then re-use that filtered list in
newInstance() calls.
private lazy val loadableSparkClasses: Seq[Class[_]] = {
Seq(
+ "org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation",
+ "[Lorg.apache.spark.util.HadoopFSUtils$SerializableBlockLocation;",
+ "org.apache.spark.util.HadoopFSUtils$SerializableFileStatus",
+
"org.apache.spark.sql.catalyst.expressions.BoundReference",
"org.apache.spark.sql.catalyst.expressions.SortOrder",
"[Lorg.apache.spark.sql.catalyst.expressions.SortOrder;",
@@ -536,6 +540,7 @@ private[serializer] object KryoSerializer {
"org.apache.spark.sql.types.DecimalType",
"org.apache.spark.sql.types.Decimal$DecimalAsIfIntegral$",
"org.apache.spark.sql.types.Decimal$DecimalIsFractional$",
+ "org.apache.spark.sql.execution.command.PartitionStatistics",
"org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTaskResult",
"org.apache.spark.sql.execution.joins.EmptyHashedRelation$",
"org.apache.spark.sql.execution.joins.LongHashedRelation",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]