[
https://issues.apache.org/jira/browse/SPARK-29186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
angerszhu updated SPARK-29186:
------------------------------
Comment: was deleted
(was: In spark-2.4 & and master branch.
RUN SQL
{code:java}
create table customerTempView(CustomerID int);
create table orderTempView(CustomerID int, OrderID int);
explain extended SELECT OID, MAX(MID + CID) AS MID_new, ROW_NUMBER() OVER
(ORDER BY CID) AS rn
FROM (
SELECT OID_1 AS OID, CID_1 AS CID, OID_1 + CID_1 AS MID
FROM (
SELECT MIN(ot.OrderID) AS OID_1, ct.CustomerID AS CID_1
FROM orderTempView ot
INNER JOIN customerTempView ct ON ot.CustomerID =
ct.CustomerID
GROUP BY CID_1
)
)
GROUP BY OID, CID
{code}
Got analyzed plan :
{code}
== Analyzed Logical Plan ==
OID: int, MID_new: int, rn: int
Project [OID#2, MID_new#5, rn#6]
+- Project [OID#2, MID_new#5, CID#3, rn#6, rn#6]
+- Window [row_number() windowspecdefinition(CID#3 ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rn#6],
[CID#3 ASC NULLS FIRST]
+- Aggregate [OID#2, CID#3], [OID#2, max((MID#4 + CID#3)) AS MID_new#5,
CID#3]
+- SubqueryAlias `__auto_generated_subquery_name`
+- Project [OID_1#0 AS OID#2, CID_1#1 AS CID#3, (OID_1#0 + CID_1#1)
AS MID#4]
+- SubqueryAlias `__auto_generated_subquery_name`
+- Aggregate [CustomerID#12], [min(OrderID#11) AS OID_1#0,
CustomerID#12 AS CID_1#1]
+- Join Inner, (CustomerID#10 = CustomerID#12)
:- SubqueryAlias `ot`
: +- SubqueryAlias `default`.`ordertempview`
: +- HiveTableRelation `default`.`ordertempview`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [CustomerID#10, OrderID#11]
+- SubqueryAlias `ct`
+- SubqueryAlias `default`.`customertempview`
+- HiveTableRelation
`default`.`customertempview`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [CustomerID#12]
{code})
> SubqueryAlias name value is null in Spark 2.4.3 Logical plan.
> -------------------------------------------------------------
>
> Key: SPARK-29186
> URL: https://issues.apache.org/jira/browse/SPARK-29186
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.3
> Environment: I have tried this on AWS Glue with Spark 2.4.3
> and on windows 10 with 2.4.4
> at both of them facing same issue
> Reporter: Tarun Khaneja
> Assignee: L. C. Hsieh
> Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
> Attachments: image-2019-09-25-12-17-53-552.png,
> image-2019-09-25-12-21-52-136.png
>
>
> I am writing a program to analyze sql query. So I am using Spark logical
> plan.I am writing a program to analyze sql query. So I am using Spark logical
> plan.
> Below is the code which I am using
>
> {code:java}
> object QueryAnalyzer
> {
> val LOG = LoggerFactory.getLogger(this.getClass) //Spark Conf
>
> val conf = new
> SparkConf().setMaster("local[2]").setAppName("LocalEdlExecutor")
> //Spark Context
> val sc = new SparkContext(conf)
> //sql Context
> val sqlContext = new SQLContext(sc)
>
> //Spark Session
> val sparkSession = SparkSession
> .builder()
> .appName("Spark User Data") .config("spark.app.name", "LocalEdl")
> .getOrCreate()
> def main(args: Array[String])
> {
> var inputDfColumns = Map[String,List[String]]()
> val dfSession = sparkSession.read.format("csv"). option("header",
> "true"). option("inferschema", "true"). option("delimiter",
> ",").option("decoding", "utf8").option("multiline", true)
>
> var oDF = dfSession. load("C:\\Users\\tarun.khaneja\\data\\order.csv")
>
> println("smaple data in oDF====>")
>
> oDF.show()
> var cusDF = dfSession.
> load("C:\\Users\\tarun.khaneja\\data\\customer.csv")
> println("smaple data in cusDF====>") cusDF.show()
> oDF.createOrReplaceTempView("orderTempView")
> cusDF.createOrReplaceTempView("customerTempView")
>
> //get input columns from all dataframe
> inputDfColumns +=
> ("orderTempView"->oDF.columns.toList)
>
> inputDfColumns +=
> ("customerTempView"->cusDF.columns.toList)
>
> val res = sqlContext.sql("""select OID, max(MID+CID) as MID_new,ROW_NUMBER()
> OVER (
> ORDER BY CID) as rn from (select OID_1 as OID,
> CID_1 as CID, OID_1+CID_1 as MID from (select min(ot.OrderID) as OID_1,
> ct.CustomerID as CID_1 from orderTempView as ot inner join customerTempView
> as ct on ot.CustomerID = ct.CustomerID group by
> CID_1)) group by OID,CID""")
> println(res.show(false))
> val analyzedPlan = res.queryExecution.analyzed
> println(analyzedPlan.prettyJson)
> }
> {code}
>
> Now problem is, with *Spark 2.2.1*, I am getting below json. where I have
> SubqueryAlias which provide important information of alias name for table
> which we used in query, as shown below.
> !image-2019-09-25-12-17-53-552.png!
>
> But with Spark 2.4, I am getting SubqueryAlias name as null. As shown below
> in json screenshot
>
> !image-2019-09-25-12-21-52-136.png!
>
> So, I am not sure if it is bug in Spark 2.4 because of which I am getting
> name as null in SubquerAlias.
> Or if it is not bug then how can I get relation between alias name and real
> table name.
> Any idea on this?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]