[jira] [Commented] (DRILL-882) Join between hive table and parquet file fail

Ramana Inukonda Nagaraj (JIRA) Fri, 30 May 2014 17:16:25 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014410#comment-14014410
 ]


Ramana Inukonda Nagaraj commented on DRILL-882:
-----------------------------------------------

Complete error:

Root: rel#2666:Subset#26.PHYSICAL.SINGLETON([]).[]
Original rel:
AbstractConverter(subset=[rel#2666:Subset#26.PHYSICAL.SINGLETON([]).[]], 
convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
rowcount = 1800.0, cumulative cost = {inf}, id = 2668
  DrillScreenRel(subset=[rel#2665:Subset#26.LOGICAL.ANY([]).[]]): rowcount = 
1800.0, cumulative cost = {180.0 rows, 180.0 cpu, 0.0 io, 0.0 network}, id = 
2664
    DrillProjectRel(subset=[rel#2663:Subset#25.LOGICAL.ANY([]).[]], 
p_partkey=[$0]): rowcount = 1800.0, cumulative cost = {1800.0 rows, 4.0 cpu, 
0.0 io, 0.0 network}, id = 2662
      DrillFilterRel(subset=[rel#2661:Subset#24.LOGICAL.ANY([]).[]], 
condition=[AND(=(CAST($0):ANY NOT NULL, $10), =($5, 41))]): rowcount = 1800.0, 
cumulative cost = {80000.0 rows, 640000.0 cpu, 0.0 io, 0.0 network}, id = 2660
        DrillJoinRel(subset=[rel#2659:Subset#23.LOGICAL.ANY([]).[]], 
condition=[true], joinType=[inner]): rowcount = 80000.0, cumulative cost = 
{80000.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 2658
          DrillScanRel(subset=[rel#2656:Subset#21.LOGICAL.ANY([]).[]], 
table=[[hive, part]]): rowcount = 10.0, cumulative cost = {10.0 rows, 90.0 cpu, 
0.0 io, 0.0 network}, id = 2533
          DrillScanRel(subset=[rel#2657:Subset#22.LOGICAL.ANY([]).[]], 
table=[[dfs, drillTestDir, tpch-multi/partsupp]]): rowcount = 8000.0, 
cumulative cost = {8000.0 rows, 16000.0 cpu, 0.0 io, 0.0 network}, id = 2446

Sets:
Set#21, type: RecordType(INTEGER p_partkey, VARCHAR(1) p_name, VARCHAR(1) 
p_mfgr, VARCHAR(1) p_brand, VARCHAR(1) p_type, INTEGER p_size, VARCHAR(1) 
p_container, FLOAT p_retailprice, VARCHAR(1) p_comment)
        rel#2656:Subset#21.LOGICAL.ANY([]).[], best=rel#2533, 
importance=0.5904900000000001
                rel#2533:DrillScanRel.LOGICAL.ANY([]).[](table=[hive, part]), 
rowcount=10.0, cumulative cost={10.0 rows, 90.0 cpu, 0.0 io, 0.0 network}
                
rel#2682:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2681:Subset#21.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=10.0, cumulative cost={inf}
        rel#2681:Subset#21.PHYSICAL.SINGLETON([]).[], best=rel#2680, 
importance=0.531441
                
rel#2683:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#2656:Subset#21.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
 rowcount=10.0, cumulative cost={inf}
                rel#2680:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=HiveScan 
[table=Table(tableName:part, dbName:default, owner:root, createTime:1401494656, 
lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:p_partkey, type:int, comment:null), 
FieldSchema(name:p_name, type:string, comment:null), FieldSchema(name:p_mfgr, 
type:string, comment:null), FieldSchema(name:p_brand, type:string, 
comment:null), FieldSchema(name:p_type, type:string, comment:null), 
FieldSchema(name:p_size, type:int, comment:null), FieldSchema(name:p_container, 
type:string, comment:null), FieldSchema(name:p_retailprice, type:float, 
comment:null), FieldSchema(name:p_comment, type:string, comment:null)], 
location:maprfs:/drill/testdata/hive_storage/part, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=|, field.delim=|}), bucketCols:[], 
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], 
skewedColValues:[], skewedColValueLocationMaps:{}), 
storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
transient_lastDdlTime=1401494656}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE), 
inputSplits=[maprfs:/drill/testdata/hive_storage/part/part.tbl:0+236074], 
columns=null]), rowcount=10.0, cumulative cost={10.0 rows, 90.0 cpu, 0.0 io, 
0.0 network}
Set#22, type: (DrillRecordRow[*, ps_partkey])
        rel#2657:Subset#22.LOGICAL.ANY([]).[], best=rel#2446, 
importance=0.5904900000000001
                rel#2446:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
drillTestDir, tpch-multi/partsupp]), rowcount=8000.0, cumulative cost={8000.0 
rows, 16000.0 cpu, 0.0 io, 0.0 network}
                
rel#2686:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2685:Subset#22.PHYSICAL.RANDOM_DISTRIBUTED([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=8000.0, cumulative cost={inf}
        rel#2685:Subset#22.PHYSICAL.RANDOM_DISTRIBUTED([]).[], best=rel#2684, 
importance=0.531441
                
rel#2687:AbstractConverter.PHYSICAL.RANDOM_DISTRIBUTED([]).[](child=rel#2657:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=RANDOM_DISTRIBUTED([]),sort=[]),
 rowcount=8000.0, cumulative cost={inf}
                
rel#2684:ScanPrel.PHYSICAL.RANDOM_DISTRIBUTED([]).[](groupscan=ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], 
selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath 
[`ps_partkey`]]]), rowcount=8000.0, cumulative cost={8000.0 rows, 16000.0 cpu, 
0.0 io, 0.0 network}
Set#23, type: RecordType(INTEGER p_partkey, VARCHAR(1) p_name, VARCHAR(1) 
p_mfgr, VARCHAR(1) p_brand, VARCHAR(1) p_type, INTEGER p_size, VARCHAR(1) 
p_container, FLOAT p_retailprice, VARCHAR(1) p_comment, ANY *, ANY ps_partkey)
        rel#2659:Subset#23.LOGICAL.ANY([]).[], best=rel#2658, importance=0.6561
                
rel#2658:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#2656:Subset#21.LOGICAL.ANY([]).[],right=rel#2657:Subset#22.LOGICAL.ANY([]).[],condition=true,joinType=inner),
 rowcount=80000.0, cumulative cost={8011.0 rows, 16091.0 cpu, 0.0 io, 0.0 
network}
                
rel#2678:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2677:Subset#23.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=1.7976931348623157E308, cumulative cost={inf}
        rel#2677:Subset#23.PHYSICAL.ANY([]).[], best=null, 
importance=0.5904900000000001
                
rel#2679:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#2659:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=80000.0, cumulative cost={inf}
Set#24, type: RecordType(INTEGER p_partkey, VARCHAR(1) p_name, VARCHAR(1) 
p_mfgr, VARCHAR(1) p_brand, VARCHAR(1) p_type, INTEGER p_size, VARCHAR(1) 
p_container, FLOAT p_retailprice, VARCHAR(1) p_comment, ANY *, ANY ps_partkey)
        rel#2661:Subset#24.LOGICAL.ANY([]).[], best=rel#2660, 
importance=0.7290000000000001
                
rel#2660:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#2659:Subset#23.LOGICAL.ANY([]).[],condition=AND(=(CAST($0):ANY
 NOT NULL, $10), =($5, 41))), rowcount=1800.0, cumulative cost={88011.0 rows, 
656091.0 cpu, 0.0 io, 0.0 network}
                
rel#2675:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2674:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=1.7976931348623157E308, cumulative cost={inf}
        rel#2674:Subset#24.PHYSICAL.ANY([]).[], best=null, importance=0.6561
                
rel#2676:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#2661:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=1800.0, cumulative cost={inf}
Set#25, type: RecordType(INTEGER p_partkey)
        rel#2663:Subset#25.LOGICAL.ANY([]).[], best=rel#2662, importance=0.81
                
rel#2662:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2661:Subset#24.LOGICAL.ANY([]).[],p_partkey=$0),
 rowcount=1800.0, cumulative cost={89811.0 rows, 656095.0 cpu, 0.0 io, 0.0 
network}
                
rel#2670:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2669:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=1.7976931348623157E308, cumulative cost={inf}
        rel#2669:Subset#25.PHYSICAL.SINGLETON([]).[], best=null, 
importance=0.7290000000000001
                
rel#2671:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#2663:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
 rowcount=1800.0, cumulative cost={inf}
Set#26, type: RecordType(INTEGER p_partkey)
        rel#2665:Subset#26.LOGICAL.ANY([]).[], best=rel#2664, importance=0.9
                
rel#2664:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#2663:Subset#25.LOGICAL.ANY([]).[]),
 rowcount=1800.0, cumulative cost={89991.0 rows, 656275.0 cpu, 0.0 io, 0.0 
network}
                
rel#2667:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2666:Subset#26.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=1.7976931348623157E308, cumulative cost={inf}
        rel#2666:Subset#26.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0
                
rel#2668:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#2665:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
 rowcount=1800.0, cumulative cost={inf}
                
rel#2672:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#2669:Subset#25.PHYSICAL.SINGLETON([]).[]),
 rowcount=1.7976931348623157E308, cumulative cost={inf}

 ]"
]


> Join between hive table and parquet file fail
> ---------------------------------------------
>
>                 Key: DRILL-882
>                 URL: https://issues.apache.org/jira/browse/DRILL-882
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Ramana Inukonda Nagaraj
>
> The following query fails with a cannot plan error
> select p.p_partkey 
>    from hive.part p, `tpch-multi/partsupp` ps 
>    where p.p_partkey = ps.ps_partkey 
>               and p.p_size = 41  
> order by p.p_partkey
> limit 20;
> The below queries work fine implying nothing is wrong with the source
> select p.p_partkey 
>    from hive.part p;
>    
> select ps.ps_partkey from `tpch-multi/partsupp` ps;
> The same query also works when both sides of join is from parquet or hive. 
> Its only when they are different that I get the below cannot plan error. 
> message: "Failure while parsing sql. < CannotPlanException:[ Node 
> [rel#2666:Subset#26.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (DRILL-882) Join between hive table and parquet file fail

Reply via email to