[jira] [Created] (SPARK-45657) Caching SQL UNION of different column data types does not work inside Dataset.union

John Zhuge (Jira) Tue, 24 Oct 2023 17:21:03 -0700

John Zhuge created SPARK-45657:
----------------------------------

             Summary: Caching SQL UNION of different column data types does not 
work inside Dataset.union
                 Key: SPARK-45657
                 URL: https://issues.apache.org/jira/browse/SPARK-45657
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.1
            Reporter: John Zhuge



 

Cache SQL UNION of 2 sides with different column data types
{code:java}
scala> spark.sql("select 1 id union select 's2' id").cache()  {code}
Dataset.union does not leverage the cache
{code:java}
scala> spark.sql("select 1 id union select 's2' id").union(spark.sql("select 
's3'")).queryExecution.optimizedPlan
res15: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Union false, false
:- Aggregate [id#109], [id#109]
:  +- Union false, false
:     :- Project [1 AS id#109]
:     :  +- OneRowRelation
:     +- Project [s2 AS id#108]
:        +- OneRowRelation
+- Project [s3 AS s3#111]
   +- OneRowRelation {code}
SQL UNION of the cached SQL UNION does use the cache! Please note 
`InMemoryRelation` used.
{code:java}
scala> spark.sql("(select 1 id union select 's2' id) union select 
's3'").queryExecution.optimizedPlan
res16: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Aggregate [id#117], [id#117]
+- Union false, false
   :- InMemoryRelation [id#117], StorageLevel(disk, memory, deserialized, 1 
replicas)
   :     +- *(4) HashAggregate(keys=[id#100], functions=[], output=[id#100])
   :        +- Exchange hashpartitioning(id#100, 500), ENSURE_REQUIREMENTS, 
[plan_id=241]
   :           +- *(3) HashAggregate(keys=[id#100], functions=[], 
output=[id#100])
   :              +- Union
   :                 :- *(1) Project [1 AS id#100]
   :                 :  +- *(1) Scan OneRowRelation[]
   :                 +- *(2) Project [s2 AS id#99]
   :                    +- *(2) Scan OneRowRelation[]
   +- Project [s3 AS s3#116]
      +- OneRowRelation {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45657) Caching SQL UNION of different column data types does not work inside Dataset.union

Reply via email to