Aaron Staple created SPARK-2781:
-----------------------------------
Summary: Analyzer should check resolution of LogicalPlans
Key: SPARK-2781
URL: https://issues.apache.org/jira/browse/SPARK-2781
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Aaron Staple
Currently the Analyzer’s CheckResolution rule checks that all attributes are
resolved by searching for unresolved Expressions. But some LogicalPlans,
including Union, contain custom implementations of the resolve attribute that
validate other criteria in addition to checking for attribute resolution of
their descendants. These LogicalPlans are not currently validated by the
CheckResolution implementation.
As a result, it is currently possible to execute a query generated from
unresolved LogicalPlans. One example is a UNION query that produces rows with
different data types in the same column:
{noformat}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
case class T1(value:Seq[Int])
val t1 = sc.parallelize(Seq(T1(Seq(0,1))))
t1.registerAsTable("t1")
sqlContext.sql("SELECT value FROM t1 UNION SELECT 2 FROM t1”).collect()
{noformat}
In this example, the type coercion implementation cannot unify array and
integer types. One row contains an array in the returned column and the other
row contains an integer. The result is:
{noformat}
res3: Array[org.apache.spark.sql.Row] = Array([List(0, 1)], [2])
{noformat}
I believe fixing this is a first step toward improving validation for Union
(and similar) plans. (For instance, Union does not currently validate that its
children contain the same number of columns.)
--
This message was sent by Atlassian JIRA
(v6.2#6252)