[
https://issues.apache.org/jira/browse/SPARK-49679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-49679:
-----------------------------------
Labels: pull-request-available (was: )
> validateSchemaOutput should refer to case sensitivity flag
> ----------------------------------------------------------
>
> Key: SPARK-49679
> URL: https://issues.apache.org/jira/browse/SPARK-49679
> Project: Spark
> Issue Type: Task
> Components: Optimizer
> Affects Versions: 4.0.0
> Reporter: Avery Qi
> Priority: Major
> Labels: pull-request-available
>
> If we're using `spark.sql.caseSensitive` set to false, we should accept
> queries like this:
> |SELECT * FROM (
> | Select a.ppmonth,
> | a.ppweek,
> | case when a.retsubcategoryderived <= 1 then 'XXXXXXXXXXXXX'
> | else
> | 'XXXXXX'
> | end as mappedflag,
> | b.name as subcategory_name,
> | sum(a.totalvalue) as RDOLLARS
> | from a, b
> | where a.retsubcategoryderived = b.retsubcategoryderived
> | group by a.Ppmonth,a.ppweek,a.retsubcategoryderived,b.name,
> mappedflag)
> However, validateSchemaOutput in optimizer's checks about plan schema changes
> does not use this flag, which leads to a situation that some queries will
> fail this check even if the optimization is correct. Take this query as an
> example:
> After AggregatePushdownThroughJoins, the plan changes from
> {{Aggregate [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
> _groupingexpression#29|#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
> _groupingexpression#29], [ppmonth#3L, ppweek#4L, _groupingexpression#29 AS
> mappedflag#0, name#13 AS subcategory_name#1, sum(totalvalue#9L) AS
> RDOLLARS#2L|#3L, ppweek#4L, _groupingexpression#29 AS mappedflag#0, name#13
> AS subcategory_name#1, sum(totalvalue#9L) AS RDOLLARS#2L]}}
> {{+- Project [ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, totalvalue#9L,
> name#13, CASE WHEN (retsubcategoryderived#7L <= 1) THEN XXXXXXXXXXXXX ELSE
> XXXXXX END AS _groupingexpression#29|#3L, ppweek#4L,
> retsubcategoryderived#7L, totalvalue#9L, name#13, CASE WHEN
> (retsubcategoryderived#7L <= 1) THEN XXXXXXXXXXXXX ELSE XXXXXX END AS
> _groupingexpression#29]}}
> {\{ +- Join Inner, (retsubcategoryderived#7L = retsubcategoryderived#10L)}}
> {{ :- Project [ppmonth#3L, ppweek#4L, retsubcategoryderived#7L,
> totalvalue#9L|#3L, ppweek#4L, retsubcategoryderived#7L, totalvalue#9L]}}
> {\{ : +- Filter isnotnull(retsubcategoryderived#7L)}}
> {{ : +- Relation
> spark_catalog.default.a[ppmonth#3L,ppweek#4L,retcategorygroupderived#5L,rethidsubcategoryderived#6L,retsubcategoryderived#7L,retsupercategoryderived#8L,totalvalue#9L|#3L,ppweek#4L,retcategorygroupderived#5L,rethidsubcategoryderived#6L,retsubcategoryderived#7L,retsupercategoryderived#8L,totalvalue#9L]
> parquet}}
> {{ +- Project [retsubcategoryderived#10L, name#13|#10L, name#13]}}
> {\{ +- Filter isnotnull(retsubcategoryderived#10L)}}
> {{ +- Relation
> spark_catalog.default.b[retsubcategoryderived#10L,description#11,displayorder#12L,name#13,shortname#14,startrange#15,endrange#16,retcategoryderived#17L,retcategorygroupderived#18L,retsupercategoryderived#19L,altbusiness#20L|#10L,description#11,displayorder#12L,name#13,shortname#14,startrange#15,endrange#16,retcategoryderived#17L,retcategorygroupderived#18L,retsupercategoryderived#19L,altbusiness#20L]
> parquet}}
> To:
> {{Project [Ppmonth#3L, ppweek#4L, _groupingexpression#29 AS mappedflag#0,
> name#13 AS subcategory_name#1, sum(totalvalue#9L)#23L AS RDOLLARS#2L|#3L,
> ppweek#4L, _groupingexpression#29 AS mappedflag#0, name#13 AS
> subcategory_name#1, sum(totalvalue#9L)#23L AS RDOLLARS#2L]}}
> {{+- AggregatePart [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
> _groupingexpression#29|#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
> _groupingexpression#29], [finalmerge_sum(merge sum#31L) AS
> sum(totalvalue#9L)#23L|#31L) AS sum(totalvalue#9L)#23L], true}}
> {{ +- AggregatePart [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L,
> name#13, _groupingexpression#29|#3L, ppweek#4L, retsubcategoryderived#7L,
> name#13, _groupingexpression#29], [merge_sum(merge sum#31L) AS sum#31L|#31L)
> AS sum#31L], false}}
> {{ +- Project [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
> _groupingexpression#29, sum#31L|#3L, ppweek#4L, retsubcategoryderived#7L,
> name#13, _groupingexpression#29, sum#31L]}}
> {\{ +- Join Inner, (retsubcategoryderived#7L = retsubcategoryderived#10L)}}
> {{ :- AggregatePart [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, CASE
> WHEN (retsubcategoryderived#7L <= 1) THEN XXXXXXXXXXXXX ELSE XXXXXX END AS
> _groupingexpression#29|#3L, ppweek#4L, retsubcategoryderived#7L, CASE WHEN
> (retsubcategoryderived#7L <= 1) THEN XXXXXXXXXXXXX ELSE XXXXXX END AS
> _groupingexpression#29], [partial_sum(totalvalue#9L) AS sum#31L|#9L) AS
> sum#31L], false}}
> {{ : +- Project [ppmonth#3L, ppweek#4L, retsubcategoryderived#7L,
> totalvalue#9L|#3L, ppweek#4L, retsubcategoryderived#7L, totalvalue#9L]}}
> {\{ : +- Filter isnotnull(retsubcategoryderived#7L)}}
> {{ : +- Relation
> spark_catalog.default.a[ppmonth#3L,ppweek#4L,retcategorygroupderived#5L,rethidsubcategoryderived#6L,retsubcategoryderived#7L,retsupercategoryderived#8L,totalvalue#9L|#3L,ppweek#4L,retcategorygroupderived#5L,rethidsubcategoryderived#6L,retsubcategoryderived#7L,retsupercategoryderived#8L,totalvalue#9L]
> parquet}}
> {{ +- Project [retsubcategoryderived#10L, name#13|#10L, name#13]}}
> {\{ +- Filter isnotnull(retsubcategoryderived#10L)}}
> {{ +- Relation
> spark_catalog.default.b[retsubcategoryderived#10L,description#11,displayorder#12L,name#13,shortname#14,startrange#15,endrange#16,retcategoryderived#17L,retcategorygroupderived#18L,retsupercategoryderived#19L,altbusiness#20L|#10L,description#11,displayorder#12L,name#13,shortname#14,startrange#15,endrange#16,retcategoryderived#17L,retcategorygroupderived#18L,retsupercategoryderived#19L,altbusiness#20L]
> parquet}}
> where the schema Ppmonth does not match with schema ppmonth.
> We need to use this flag in validateSchemaOutput.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]