[
https://issues.apache.org/jira/browse/SPARK-41557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651026#comment-17651026
]
Maxime Thébault edited comment on SPARK-41557 at 12/22/22 12:31 AM:
--------------------------------------------------------------------
Might be related to (and fixed by) SPARK-41660?
SPARK-41498 is also related to metadata columns + union
was (Author: JIRAUSER279874):
Might be related to (and fixed by) SPARK-41660?
> Union of tables with and without metadata column fails when used in join
> ------------------------------------------------------------------------
>
> Key: SPARK-41557
> URL: https://issues.apache.org/jira/browse/SPARK-41557
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.2, 3.4.0
> Reporter: Shardul Mahadik
> Priority: Major
>
> Here is a test case that can be added to {{MetadataColumnSuite}} to
> demonstrate the issue
> {code:scala}
> test("SPARK-41557: Union of tables with and without metadata column should
> work") {
> withTable(tbl) {
> sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
> checkAnswer(
> spark.sql(
> s"""
> SELECT b.*
> FROM RANGE(1)
> LEFT JOIN (
> SELECT id FROM $tbl
> UNION ALL
> SELECT id FROM RANGE(10)
> ) b USING(id)
> """),
> Seq(Row(0))
> )
> }
> }
> {code}
> Here a table with metadata columns {{$tbl}} is unioned with a table without
> metdata columns {{RANGE(10)}}. If this result is later used in a join, query
> analysis fails saying mismatch in the number of columns of the union caused
> by the metadata columns. However, here we can see that we explicitly project
> only one column during the union, so the union should be valid.
> {code}
> org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only
> be performed on inputs with the same number of columns, but the first input
> has 3 columns and the second input has 1 columns.; line 5 pos 16;
> 'Project [id#26L]
> +- 'Project [id#26L, id#26L]
> +- 'Project [id#28L, id#26L]
> +- 'Join LeftOuter, (id#28L = id#26L)
> :- Range (0, 1, step=1, splits=None)
> +- 'SubqueryAlias b
> +- 'Union false, false
> :- Project [id#26L, index#30, _partition#31]
> : +- SubqueryAlias testcat.t
> : +- RelationV2[id#26L, data#27, index#30, _partition#31]
> testcat.t testcat.t
> +- Project [id#29L]
> +- Range (0, 10, step=1, splits=None)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]