Michael Armbrust created SPARK-6851:
---------------------------------------

             Summary: Wrong answers for self joins of converted parquet 
relations
                 Key: SPARK-6851
                 URL: https://issues.apache.org/jira/browse/SPARK-6851
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.1
            Reporter: Michael Armbrust
            Assignee: Michael Armbrust
            Priority: Blocker


>From the user list (
/cc [~chinnitv])  When the same relation exists twice in a query plan, our new 
caching logic replaces both instances with identical replacements.  The bug can 
be see in the following transformation:

{code}
=== Applying Rule 
org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions ===
!Project [state#59,month#60]                                           'Project 
[state#105,month#106]
! Join Inner, Some(((state#69 = state#59) && (month#70 = month#60)))    'Join 
Inner, Some(((state#105 = state#105) && (month#106 = month#106)))
!  MetastoreRelation default, orders, None                               
Subquery orders
!  Subquery ao                                                            
Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106]
 org.apache.spark.sql.parquet.ParquetRelation2
!   Distinct                                                             
Subquery ao
!    Project [state#69,month#70]                                          
Distinct
!     Join Inner, Some((id#81 = id#71))                                    
Project [state#105,month#106]
!      MetastoreRelation default, orders, None                              
Join Inner, Some((id#115 = id#97))
!      MetastoreRelation default, orderupdates, None                         
Subquery orders
!                                                                             
Relation[id#97,category#98,make#99,type#100,price#101,pdate#102,customer#103,city#104,state#105,month#106]
 org.apache.spark.sql.parquet.ParquetRelation2
!                                                                            
Subquery orderupdates
!                                                                             
Relation[id#115,category#116,make#117,type#118,price#119,pdate#120,customer#121,city#122,state#123,month#124]
 org.apache.spark.sql.parquet.ParquetRelation2
{code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to