(spark) branch master updated: [SPARK-50755][SQL] Pretty plan display for InsertIntoHiveTable

dongjoon Tue, 07 Jan 2025 16:30:47 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 0b443f4a5041 [SPARK-50755][SQL] Pretty plan display for 
InsertIntoHiveTable
0b443f4a5041 is described below

commit 0b443f4a5041943b25438822d3c1a76b3236e5cb
Author: Cheng Pan <[email protected]>
AuthorDate: Tue Jan 7 16:30:09 2025 -0800

    [SPARK-50755][SQL] Pretty plan display for InsertIntoHiveTable
    
    ### What changes were proposed in this pull request?
    
    Add `toString` for `HiveFileFormat` and `HiveTempPath` to make the display 
of `InsertIntoHiveTable` plan pretty.
    
    ### Why are the changes needed?
    
    I found the current plan replacing rules does not handle tailing object 
hash properly 
https://github.com/apache/spark/blob/36d23eff4b4c3a2b8fd301672e532132c96fdd68/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala#L62
    
    instead of fixing the replacing rule(see #49396, and please let me know if 
any reviewer think we should fix that too), seems we can override the 
`toString` of those classes to make it display pretty.
    
    Minor improvements of plan display for `InsertIntoHiveTable`, and make it 
consistent with `DataSource` plan like `InsertIntoHadoopFsRelationCommand`
    
    `InsertIntoHadoopFsRelationCommand`:
    ```
    -- !query
    insert into t6 values (97)
    -- !query analysis
    InsertIntoHadoopFsRelationCommand file:[not included in 
comparison]/{warehouse_dir}/t6, false, Parquet, [path=file:[not included in 
comparison]/{warehouse_dir}/t6], Append, `spark_catalog`.`default`.`t6`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included 
in comparison]/{warehouse_dir}/t6), [ascii]
    +- Project [cast(col1#x as bigint) AS ascii#xL]
       +- LocalRelation [col1#x]
    ```
    
    `InsertIntoHiveTable`:
    ```patch
     -- !query
     insert into table spark_test_json_2021_07_16_01 values(1, 'a')
     -- !query analysis
    -InsertIntoHiveTable 
`spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1, 
c2], org.apache.spark.sql.hive.execution.HiveFileFormatxxxxxxxx, 
org.apache.spark.sql.hive.execution.HiveTempPath69beda67
    +InsertIntoHiveTable 
`spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1, 
c2], Hive, HiveTempPath(file:[not included in 
comparison]/{warehouse_dir}/spark_test_json_2021_07_16_01)
     +- Project [cast(col1#x as int) AS c1#x, cast(col2#x as string) AS c2#x]
        +- LocalRelation [col1#x, col2#x]
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    It affects the `EXPLAIN` outputs and Spark UI `SQL/DataFrame` tab plan 
display
    
    ### How was this patch tested?
    
    See the above examples.
    
    Spark does not have SQL tests related to the `hive` module, I identified 
this issue when porting internal test cases to the 4.0. Since all existing SQL 
tests live on the `sql` module, adding hive-related tests is impossible.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #49400 from pan3793/SPARK-50755.
    
    Authored-by: Cheng Pan <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala | 2 ++
 .../main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala   | 2 ++
 2 files changed, 4 insertions(+)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
index cabdddd4c475..0d4efd9e7774 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
@@ -55,6 +55,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
 
   override def shortName(): String = "hive"
 
+  override def toString: String = "Hive"
+
   override def inferSchema(
       sparkSession: SparkSession,
       options: Map[String, String],
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
index 16edfea67e38..d97d3cd6dd4a 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
@@ -165,4 +165,6 @@ class HiveTempPath(session: SparkSession, val hadoopConf: 
Configuration, path: P
   def deleteIfNotStagingDir(path: Path, fs: FileSystem): Unit = {
     if (Option(path) != stagingDirForCreating) fs.delete(path, true)
   }
+
+  override def toString: String = s"HiveTempPath($path)"
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-50755][SQL] Pretty plan display for InsertIntoHiveTable

Reply via email to