[ https://issues.apache.org/jira/browse/SPARK-45657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779326#comment-17779326 ]
John Zhuge commented on SPARK-45657: ------------------------------------ It is fixed in main branch {code:java} ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/Using Scala version 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.7) Type in expressions to have them evaluated. Type :help for more information. 23/10/24 21:30:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://192.168.86.29:4040 Spark context available as 'sc' (master = local[*], app id = local-1698208231783). Spark session available as 'spark'.scala> spark.sql("select 1 id union select 's2' id").cache() val res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: string]scala> spark.sql("select 1 id union select 's2' id").union(spark.sql("select 's3'")).queryExecution.optimizedPlan val res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Union false, false :- InMemoryRelation [id#11], StorageLevel(disk, memory, deserialized, 1 replicas) : +- AdaptiveSparkPlan isFinalPlan=false : +- HashAggregate(keys=[id#2], functions=[], output=[id#2]) : +- Exchange hashpartitioning(id#2, 200), ENSURE_REQUIREMENTS, [plan_id=30] : +- HashAggregate(keys=[id#2], functions=[], output=[id#2]) : +- Union : :- Project [1 AS id#2] : : +- Scan OneRowRelation[] : +- Project [s2 AS id#1] : +- Scan OneRowRelation[] +- Project [s3 AS s3#13] +- OneRowRelation {code} > Caching SQL UNION of different column data types does not work inside > Dataset.union > ----------------------------------------------------------------------------------- > > Key: SPARK-45657 > URL: https://issues.apache.org/jira/browse/SPARK-45657 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.2 > Reporter: John Zhuge > Priority: Major > > > Cache SQL UNION of 2 sides with different column data types > {code:java} > scala> spark.sql("select 1 id union select 's2' id").cache() {code} > Dataset.union does not leverage the cache > {code:java} > scala> spark.sql("select 1 id union select 's2' id").union(spark.sql("select > 's3'")).queryExecution.optimizedPlan > res15: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = > Union false, false > :- Aggregate [id#109], [id#109] > : +- Union false, false > : :- Project [1 AS id#109] > : : +- OneRowRelation > : +- Project [s2 AS id#108] > : +- OneRowRelation > +- Project [s3 AS s3#111] > +- OneRowRelation {code} > SQL UNION of the cached SQL UNION does use the cache! Please note > `InMemoryRelation` used. > {code:java} > scala> spark.sql("(select 1 id union select 's2' id) union select > 's3'").queryExecution.optimizedPlan > res16: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = > Aggregate [id#117], [id#117] > +- Union false, false > :- InMemoryRelation [id#117], StorageLevel(disk, memory, deserialized, 1 > replicas) > : +- *(4) HashAggregate(keys=[id#100], functions=[], output=[id#100]) > : +- Exchange hashpartitioning(id#100, 500), ENSURE_REQUIREMENTS, > [plan_id=241] > : +- *(3) HashAggregate(keys=[id#100], functions=[], > output=[id#100]) > : +- Union > : :- *(1) Project [1 AS id#100] > : : +- *(1) Scan OneRowRelation[] > : +- *(2) Project [s2 AS id#99] > : +- *(2) Scan OneRowRelation[] > +- Project [s3 AS s3#116] > +- OneRowRelation {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org