Avery Qi created SPARK-49679:
--------------------------------
Summary: validateSchemaOutput should refer to case sensitivity flag
Key: SPARK-49679
URL: https://issues.apache.org/jira/browse/SPARK-49679
Project: Spark
Issue Type: Task
Components: Optimizer
Affects Versions: 4.0.0
Reporter: Avery Qi
If we're using `spark.sql.caseSensitive` set to false, we should accept queries
like this:
{{"""|SELECT * FROM (
| Select a.ppmonth,
| a.ppweek,
| case when a.retsubcategoryderived <= 1 then 'XXXXXXXXXXXXX'
| else
| 'XXXXXX'
| end as mappedflag,
| b.name as subcategory_name,
| sum(a.totalvalue) as RDOLLARS
| from a, b
| where a.retsubcategoryderived = b.retsubcategoryderived
| group by a.Ppmonth,a.ppweek,a.retsubcategoryderived,b.name, mappedflag)
|""".stripMargin}}
However, validateSchemaOutput in optimizer's checks about plan schema changes
does not use this flag, which leads to a situation that some queries will fail
this check even if the optimization is correct. Take this query as an example:
After AggregatePushdownThroughJoins, the plan changes from
{{Aggregate [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
_groupingexpression#29], [ppmonth#3L, ppweek#4L, _groupingexpression#29 AS
mappedflag#0, name#13 AS subcategory_name#1, sum(totalvalue#9L) AS
RDOLLARS#2L]}}
{{+- Project [ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, totalvalue#9L,
name#13, CASE WHEN (retsubcategoryderived#7L <= 1) THEN XXXXXXXXXXXXX ELSE
XXXXXX END AS _groupingexpression#29]}}
{{ +- Join Inner, (retsubcategoryderived#7L = retsubcategoryderived#10L)}}
{{ :- Project [ppmonth#3L, ppweek#4L, retsubcategoryderived#7L,
totalvalue#9L]}}
{{ : +- Filter isnotnull(retsubcategoryderived#7L)}}
{{ : +- Relation
spark_catalog.default.a[ppmonth#3L,ppweek#4L,retcategorygroupderived#5L,rethidsubcategoryderived#6L,retsubcategoryderived#7L,retsupercategoryderived#8L,totalvalue#9L]
parquet}}
{{ +- Project [retsubcategoryderived#10L, name#13]}}
{{ +- Filter isnotnull(retsubcategoryderived#10L)}}
{{ +- Relation
spark_catalog.default.b[retsubcategoryderived#10L,description#11,displayorder#12L,name#13,shortname#14,startrange#15,endrange#16,retcategoryderived#17L,retcategorygroupderived#18L,retsupercategoryderived#19L,altbusiness#20L]
parquet}}
To:
{{Project [Ppmonth#3L, ppweek#4L, _groupingexpression#29 AS mappedflag#0,
name#13 AS subcategory_name#1, sum(totalvalue#9L)#23L AS RDOLLARS#2L]}}
{{+- AggregatePart [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
_groupingexpression#29], [finalmerge_sum(merge sum#31L) AS
sum(totalvalue#9L)#23L], true}}
{{ +- AggregatePart [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L,
name#13, _groupingexpression#29], [merge_sum(merge sum#31L) AS sum#31L], false}}
{{ +- Project [Ppmonth#3L, ppweek#4L, retsubcategoryderived#7L, name#13,
_groupingexpression#29, sum#31L]}}
{{ +- Join Inner, (retsubcategoryderived#7L =
retsubcategoryderived#10L)}}
{{ :- AggregatePart [Ppmonth#3L, ppweek#4L,
retsubcategoryderived#7L, CASE WHEN (retsubcategoryderived#7L <= 1) THEN
XXXXXXXXXXXXX ELSE XXXXXX END AS _groupingexpression#29],
[partial_sum(totalvalue#9L) AS sum#31L], false}}
{{ : +- Project [ppmonth#3L, ppweek#4L, retsubcategoryderived#7L,
totalvalue#9L]}}
{{ : +- Filter isnotnull(retsubcategoryderived#7L)}}
{{ : +- Relation
spark_catalog.default.a[ppmonth#3L,ppweek#4L,retcategorygroupderived#5L,rethidsubcategoryderived#6L,retsubcategoryderived#7L,retsupercategoryderived#8L,totalvalue#9L]
parquet}}
{{ +- Project [retsubcategoryderived#10L, name#13]}}
{{ +- Filter isnotnull(retsubcategoryderived#10L)}}
{{ +- Relation
spark_catalog.default.b[retsubcategoryderived#10L,description#11,displayorder#12L,name#13,shortname#14,startrange#15,endrange#16,retcategoryderived#17L,retcategorygroupderived#18L,retsupercategoryderived#19L,altbusiness#20L]
parquet}}
where the schema Ppmonth does not match with schema ppmonth.
We need to use this flag in validateSchemaOutput.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]