Michael Armbrust created SPARK-6851:
---------------------------------------
Summary: Wrong answers for self joins of converted parquet
relations
Key: SPARK-6851
URL: https://issues.apache.org/jira/browse/SPARK-6851
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.3.1
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Blocker
>From the user list (
/cc [~chinnitv]) When the same relation exists twice in a query plan, our new
caching logic replaces both instances with identical replacements. The bug can
be see in the following transformation:
{code}
=== Applying Rule
org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions ===
!Project [state#59,month#60] 'Project
[state#105,month#106]
! Join Inner, Some(((state#69 = state#59) && (month#70 = month#60))) 'Join
Inner, Some(((state#105 = state#105) && (month#106 = month#106)))
! MetastoreRelation default, orders, None
Subquery orders
! Subquery ao
Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106]
org.apache.spark.sql.parquet.ParquetRelation2
! Distinct
Subquery ao
! Project [state#69,month#70]
Distinct
! Join Inner, Some((id#81 = id#71))
Project [state#105,month#106]
! MetastoreRelation default, orders, None
Join Inner, Some((id#115 = id#97))
! MetastoreRelation default, orderupdates, None
Subquery orders
!
Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106]
org.apache.spark.sql.parquet.ParquetRelation2
!
Subquery orderupdates
!
Relation[id#115,category#116,make#117,type#118,price#119,pdate#120,customer#121,city#122,state#123,month#124]
org.apache.spark.sql.parquet.ParquetRelation2
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]