[
https://issues.apache.org/jira/browse/SPARK-32638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guojian Li updated SPARK-32638:
-------------------------------
Description:
I am migrating sql from mysql to spark sql, meet a very strange case. Below is
code to reproduce the exception:
{code:java}
val spark = SparkSession.builder()
.master("local")
.appName("Word Count")
.getOrCreate()
spark.sparkContext.setLogLevel("TRACE")
val DecimalType = DataTypes.createDecimalType(20, 2)
val schema = StructType(List(
StructField("a", DecimalType, true)
))
val dataList = new util.ArrayList[Row]()
val df=spark.createDataFrame(dataList,schema)
df.printSchema()
df.createTempView("test")
val sql=
"""
|SELECT t.kpi_04 FROM
|(
| SELECT a as `kpi_04` FROM test
| UNION ALL
| SELECT a+a as `kpi_04` FROM test
|) t
|
""".stripMargin
spark.sql(sql)
{code}
Exception Message:
{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved
attribute(s) kpi_04#2 missing from kpi_04#4 in operator !Project [kpi_04#2].
Attribute(s) with the same name appear in the operation: kpi_04. Please check
if the right attribute(s) are used.;;
!Project [kpi_04#2]
+- SubqueryAlias t
+- Union
:- Project [cast(kpi_04#2 as decimal(21,2)) AS kpi_04#4]
: +- Project [a#0 AS kpi_04#2]
: +- SubqueryAlias test
: +- LocalRelation <empty>, [a#0]
+- Project [kpi_04#3]
+- Project [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS kpi_04#3]
+- SubqueryAlias test
+- LocalRelation <empty>, [a#0]{code}
Base the trace log ,seemly the WidenSetOperationTypes add new outer project
layer. It caused the parent query lose the reference to subquery.
{code:java}
=== Applying Rule
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes ===
!'Project [kpi_04#2] !Project [kpi_04#2]
!+- 'SubqueryAlias t +- SubqueryAlias t
! +- 'Union +- Union
! :- Project [a#0 AS kpi_04#2] :- Project [cast(kpi_04#2 as decimal(21,2)) AS
kpi_04#4]
! : +- SubqueryAlias test : +- Project [a#0 AS kpi_04#2]
! : +- LocalRelation <empty>, [a#0] : +- SubqueryAlias test
! +- Project [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS kpi_04#3]
: +- LocalRelation <empty>, [a#0]
! +- SubqueryAlias test +- Project [kpi_04#3]
! +- LocalRelation <empty>, [a#0] +- Project
[CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS kpi_04#3]
! +- SubqueryAlias test
! +- LocalRelation <empty>, [a#0]
{code}
in the source code ,WidenSetOperationTypes.scala. it is a intent behavior,
but possibly miss this edge case.
I hope someone can help me out to fix it .
{code:java}
if (targetTypes.nonEmpty) {
// Add an extra Project if the targetTypes are different from the original
types.
children.map(widenTypes(_, targetTypes))
} else {
// Unable to find a target type to widen, then just return the original set.
children
}{code}
was:
I am migrating sql from mysql to spark sql, meet a very strange case. Below is
code to reproduce the exception:
{code:java}
val spark = SparkSession.builder()
.master("local")
.appName("Word Count")
.getOrCreate()
spark.sparkContext.setLogLevel("TRACE")
val DecimalType = DataTypes.createDecimalType(20, 2)
val schema = StructType(List(
StructField("a", DecimalType, true)
))
val dataList = new util.ArrayList[Row]()
val df=spark.createDataFrame(dataList,schema)
df.printSchema()
df.createTempView("test")
val sql=
"""
|SELECT t.kpi_04 FROM
|(
| SELECT a as `kpi_04` FROM test
| UNION ALL
| SELECT a+a as `kpi_04` FROM test
|) t
|
""".stripMargin
spark.sql(sql)
{code}
Exception Message:
{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved
attribute(s) kpi_04#2 missing from kpi_04#4 in operator !Project [kpi_04#2].
Attribute(s) with the same name appear in the operation: kpi_04. Please check
if the right attribute(s) are used.;;
!Project [kpi_04#2]
+- SubqueryAlias t
+- Union
:- Project [cast(kpi_04#2 as decimal(21,2)) AS kpi_04#4]
: +- Project [a#0 AS kpi_04#2]
: +- SubqueryAlias test
: +- LocalRelation <empty>, [a#0]
+- Project [kpi_04#3]
+- Project [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS kpi_04#3]
+- SubqueryAlias test
+- LocalRelation <empty>, [a#0]{code}
Base the trace log ,seemly the WidenSetOperationTypes add new outer project
layer. It caused the parent query lose the reference to subquery.
{code:java}
=== Applying Rule
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes ===
!'Project [kpi_04#2] !Project [kpi_04#2]
!+- 'SubqueryAlias t +- SubqueryAlias t
! +- 'Union +- Union
! :- Project [a#0 AS kpi_04#2] :- Project [cast(kpi_04#2 as decimal(21,2)) AS
kpi_04#4]
! : +- SubqueryAlias test : +- Project [a#0 AS kpi_04#2]
! : +- LocalRelation <empty>, [a#0] : +- SubqueryAlias test
! +- Project [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS kpi_04#3]
: +- LocalRelation <empty>, [a#0]
! +- SubqueryAlias test +- Project [kpi_04#3]
! +- LocalRelation <empty>, [a#0] +- Project
[CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS kpi_04#3]
! +- SubqueryAlias test
! +- LocalRelation <empty>, [a#0]
{code}
in the source code ,WidenSetOperationTypes.scala. it is a intent behavior,
but miss some edge case.
I hope someone can help me out to fix it .
{code:java}
if (targetTypes.nonEmpty) {
// Add an extra Project if the targetTypes are different from the original
types.
children.map(widenTypes(_, targetTypes))
} else {
// Unable to find a target type to widen, then just return the original set.
children
}{code}
> TypeCoercion$WidenSetOperationTypes causes subquery attribute unresolved
> exception
> ------------------------------------------------------------------------------------
>
> Key: SPARK-32638
> URL: https://issues.apache.org/jira/browse/SPARK-32638
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.4
> Reporter: Guojian Li
> Priority: Major
>
> I am migrating sql from mysql to spark sql, meet a very strange case. Below
> is code to reproduce the exception:
>
> {code:java}
> val spark = SparkSession.builder()
> .master("local")
> .appName("Word Count")
> .getOrCreate()
> spark.sparkContext.setLogLevel("TRACE")
> val DecimalType = DataTypes.createDecimalType(20, 2)
> val schema = StructType(List(
> StructField("a", DecimalType, true)
> ))
> val dataList = new util.ArrayList[Row]()
> val df=spark.createDataFrame(dataList,schema)
> df.printSchema()
> df.createTempView("test")
> val sql=
> """
> |SELECT t.kpi_04 FROM
> |(
> | SELECT a as `kpi_04` FROM test
> | UNION ALL
> | SELECT a+a as `kpi_04` FROM test
> |) t
> |
> """.stripMargin
> spark.sql(sql)
> {code}
>
> Exception Message:
>
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved
> attribute(s) kpi_04#2 missing from kpi_04#4 in operator !Project [kpi_04#2].
> Attribute(s) with the same name appear in the operation: kpi_04. Please check
> if the right attribute(s) are used.;;
> !Project [kpi_04#2]
> +- SubqueryAlias t
> +- Union
> :- Project [cast(kpi_04#2 as decimal(21,2)) AS kpi_04#4]
> : +- Project [a#0 AS kpi_04#2]
> : +- SubqueryAlias test
> : +- LocalRelation <empty>, [a#0]
> +- Project [kpi_04#3]
> +- Project [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
> promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS
> kpi_04#3]
> +- SubqueryAlias test
> +- LocalRelation <empty>, [a#0]{code}
>
>
> Base the trace log ,seemly the WidenSetOperationTypes add new outer project
> layer. It caused the parent query lose the reference to subquery.
>
>
> {code:java}
>
> === Applying Rule
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes ===
> !'Project [kpi_04#2] !Project [kpi_04#2]
> !+- 'SubqueryAlias t +- SubqueryAlias t
> ! +- 'Union +- Union
> ! :- Project [a#0 AS kpi_04#2] :- Project [cast(kpi_04#2 as decimal(21,2)) AS
> kpi_04#4]
> ! : +- SubqueryAlias test : +- Project [a#0 AS kpi_04#2]
> ! : +- LocalRelation <empty>, [a#0] : +- SubqueryAlias test
> ! +- Project [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
> promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS
> kpi_04#3] : +- LocalRelation <empty>, [a#0]
> ! +- SubqueryAlias test +- Project [kpi_04#3]
> ! +- LocalRelation <empty>, [a#0] +- Project
> [CheckOverflow((promote_precision(cast(a#0 as decimal(21,2))) +
> promote_precision(cast(a#0 as decimal(21,2)))), DecimalType(21,2)) AS
> kpi_04#3]
> ! +- SubqueryAlias test
> ! +- LocalRelation <empty>, [a#0]
> {code}
>
> in the source code ,WidenSetOperationTypes.scala. it is a intent behavior,
> but possibly miss this edge case.
> I hope someone can help me out to fix it .
>
>
> {code:java}
> if (targetTypes.nonEmpty) {
> // Add an extra Project if the targetTypes are different from the original
> types.
> children.map(widenTypes(_, targetTypes))
> } else {
> // Unable to find a target type to widen, then just return the original set.
> children
> }{code}
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]