This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 0b443f4a5041 [SPARK-50755][SQL] Pretty plan display for
InsertIntoHiveTable
0b443f4a5041 is described below
commit 0b443f4a5041943b25438822d3c1a76b3236e5cb
Author: Cheng Pan <[email protected]>
AuthorDate: Tue Jan 7 16:30:09 2025 -0800
[SPARK-50755][SQL] Pretty plan display for InsertIntoHiveTable
### What changes were proposed in this pull request?
Add `toString` for `HiveFileFormat` and `HiveTempPath` to make the display
of `InsertIntoHiveTable` plan pretty.
### Why are the changes needed?
I found the current plan replacing rules does not handle tailing object
hash properly
https://github.com/apache/spark/blob/36d23eff4b4c3a2b8fd301672e532132c96fdd68/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala#L62
instead of fixing the replacing rule(see #49396, and please let me know if
any reviewer think we should fix that too), seems we can override the
`toString` of those classes to make it display pretty.
Minor improvements of plan display for `InsertIntoHiveTable`, and make it
consistent with `DataSource` plan like `InsertIntoHadoopFsRelationCommand`
`InsertIntoHadoopFsRelationCommand`:
```
-- !query
insert into t6 values (97)
-- !query analysis
InsertIntoHadoopFsRelationCommand file:[not included in
comparison]/{warehouse_dir}/t6, false, Parquet, [path=file:[not included in
comparison]/{warehouse_dir}/t6], Append, `spark_catalog`.`default`.`t6`,
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included
in comparison]/{warehouse_dir}/t6), [ascii]
+- Project [cast(col1#x as bigint) AS ascii#xL]
+- LocalRelation [col1#x]
```
`InsertIntoHiveTable`:
```patch
-- !query
insert into table spark_test_json_2021_07_16_01 values(1, 'a')
-- !query analysis
-InsertIntoHiveTable
`spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1,
c2], org.apache.spark.sql.hive.execution.HiveFileFormatxxxxxxxx,
org.apache.spark.sql.hive.execution.HiveTempPath69beda67
+InsertIntoHiveTable
`spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1,
c2], Hive, HiveTempPath(file:[not included in
comparison]/{warehouse_dir}/spark_test_json_2021_07_16_01)
+- Project [cast(col1#x as int) AS c1#x, cast(col2#x as string) AS c2#x]
+- LocalRelation [col1#x, col2#x]
```
### Does this PR introduce _any_ user-facing change?
It affects the `EXPLAIN` outputs and Spark UI `SQL/DataFrame` tab plan
display
### How was this patch tested?
See the above examples.
Spark does not have SQL tests related to the `hive` module, I identified
this issue when porting internal test cases to the 4.0. Since all existing SQL
tests live on the `sql` module, adding hive-related tests is impossible.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #49400 from pan3793/SPARK-50755.
Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala | 2 ++
.../main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala | 2 ++
2 files changed, 4 insertions(+)
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
index cabdddd4c475..0d4efd9e7774 100644
---
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
+++
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
@@ -55,6 +55,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
override def shortName(): String = "hive"
+ override def toString: String = "Hive"
+
override def inferSchema(
sparkSession: SparkSession,
options: Map[String, String],
diff --git
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
index 16edfea67e38..d97d3cd6dd4a 100644
---
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
+++
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
@@ -165,4 +165,6 @@ class HiveTempPath(session: SparkSession, val hadoopConf:
Configuration, path: P
def deleteIfNotStagingDir(path: Path, fs: FileSystem): Unit = {
if (Option(path) != stagingDirForCreating) fs.delete(path, true)
}
+
+ override def toString: String = s"HiveTempPath($path)"
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]