[
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590832#comment-16590832
]
Timothy Farkas commented on DRILL-6706:
---------------------------------------
I fixed a similar issue in https://issues.apache.org/jira/browse/DRILL-6644
while working on it. Waiting to see if that fix resolves this issue.
> Query with 10-way hash join fails with NullPointerException
> -----------------------------------------------------------
>
> Key: DRILL-6706
> URL: https://issues.apache.org/jira/browse/DRILL-6706
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators, Query Planning &
> Optimization
> Affects Versions: 1.15.0
> Reporter: Abhishek Girish
> Assignee: Timothy Farkas
> Priority: Critical
> Attachments: drillbit.log.zip
>
>
> {code}
> SELECT C.C_CUSTKEY AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
> si.tpch_sf1_parquet.orders O,
> si.tpch_sf1_parquet.lineitem L,
> si.tpch_sf1_parquet.part P,
> si.tpch_sf1_parquet.supplier S,
> si.tpch_sf1_parquet.partsupp PS,
> si.tpch_sf1_parquet.nation S_N,
> si.tpch_sf1_parquet.region S_R,
> si.tpch_sf1_parquet.nation C_N,
> si.tpch_sf1_parquet.region C_R
> WHERE C.C_CUSTKEY = O.O_CUSTKEY
> AND O.O_ORDERKEY = L.L_ORDERKEY
> AND L.L_PARTKEY = P.P_PARTKEY
> AND L.L_SUPPKEY = S.S_SUPPKEY
> AND P.P_PARTKEY = PS.PS_PARTKEY
> AND P.P_SUPPKEY = PS.PS_SUPPKEY
> AND S.S_NATIONKEY = S_N.N_NATIONKEY
> AND S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND C.C_NATIONKEY = C_N.N_NATIONKEY
> AND C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00 Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0,
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io,
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY):
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu,
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02 UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount =
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01 Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY):
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu,
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02 Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12],
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2],
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4],
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6],
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18])
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY,
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost =
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network,
> 4.8577056E7 memory}, id = 515364
> 01-03 HashJoin(condition=[=($13, $0)], joinType=[inner]) :
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY,
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0,
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io,
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05 HashJoin(condition=[=($1, $10)], joinType=[inner]) :
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY):
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu,
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08 HashJoin(condition=[=($2, $3)], joinType=[inner]) :
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows,
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10 Scan(table=[[si, tpch_sf1_parquet, lineitem]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/lineitem]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/lineitem, numFiles=1,
> numRowGroups=3, usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_PARTKEY`,
> `L_SUPPKEY`]]]) : rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY
> L_SUPPKEY): rowcount = 6001215.0, cumulative cost = {6001215.0 rows,
> 1.8003645E7 cpu, 1.8003645E7 io, 0.0 network, 0.0 memory}, id = 515341
> 01-09 BroadcastExchange : rowType = RecordType(ANY
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY
> R_REGIONKEY): rowcount = 10000.0, cumulative cost = {30085.0 rows, 220595.0
> cpu, 20055.0 io, 6.3488E8 network, 528.0 memory}, id = 515347
> 02-01 HashJoin(condition=[=($1, $2)], joinType=[inner])
> : rowType = RecordType(ANY S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY
> N_REGIONKEY, ANY R_REGIONKEY): rowcount = 10000.0, cumulative cost = {20085.0
> rows, 140595.0 cpu, 20055.0 io, 0.0 network, 528.0 memory}, id = 515346
> 02-03 Scan(table=[[si, tpch_sf1_parquet, supplier]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/supplier]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/supplier, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`S_SUPPKEY`,
> `S_NATIONKEY`]]]) : rowType = RecordType(ANY S_SUPPKEY, ANY S_NATIONKEY):
> rowcount = 10000.0, cumulative cost = {10000.0 rows, 20000.0 cpu, 20000.0 io,
> 0.0 network, 0.0 memory}, id = 515342
> 02-02 HashJoin(condition=[=($1, $2)],
> joinType=[inner]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY,
> ANY R_REGIONKEY): rowcount = 25.0, cumulative cost = {60.0 rows, 395.0 cpu,
> 55.0 io, 0.0 network, 88.0 memory}, id = 515345
> 02-05 Scan(table=[[si, tpch_sf1_parquet, nation]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/nation]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/nation, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`N_NATIONKEY`,
> `N_REGIONKEY`]]]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY):
> rowcount = 25.0, cumulative cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0
> network, 0.0 memory}, id = 515343
> 02-04 Scan(table=[[si, tpch_sf1_parquet, region]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/region]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/region, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`R_REGIONKEY`]]]) : rowType
> = RecordType(ANY R_REGIONKEY): rowcount = 5.0, cumulative cost = {5.0 rows,
> 5.0 cpu, 5.0 io, 0.0 network, 0.0 memory}, id = 515344
> 01-07 BroadcastExchange : rowType = RecordType(ANY
> PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): rowcount =
> 800000.0, cumulative cost = {2800000.0 rows, 3.08E7 cpu, 2000000.0 io,
> 4.063232E10 network, 5280000.0 memory}, id = 515352
> 03-01 HashJoin(condition=[AND(=($2, $0), =($3, $1))],
> joinType=[inner]) : rowType = RecordType(ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY
> P_PARTKEY, ANY P_SUPPKEY): rowcount = 800000.0, cumulative cost = {2000000.0
> rows, 2.44E7 cpu, 2000000.0 io, 0.0 network, 5280000.0 memory}, id = 515351
> 03-03 Scan(table=[[si, tpch_sf1_parquet, partsupp]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/partsupp]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/partsupp, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`PS_PARTKEY`,
> `PS_SUPPKEY`]]]) : rowType = RecordType(ANY PS_PARTKEY, ANY PS_SUPPKEY):
> rowcount = 800000.0, cumulative cost = {800000.0 rows, 1600000.0 cpu,
> 1600000.0 io, 0.0 network, 0.0 memory}, id = 515349
> 03-02 Scan(table=[[si, tpch_sf1_parquet, part]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/part]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/part, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`P_PARTKEY`, `P_SUPPKEY`]]])
> : rowType = RecordType(ANY P_PARTKEY, ANY P_SUPPKEY): rowcount = 200000.0,
> cumulative cost = {200000.0 rows, 400000.0 cpu, 400000.0 io, 0.0 network, 0.0
> memory}, id = 515350
> 01-04 Project(O_CUSTKEY=[$0], O_ORDERKEY=[$1], C_CUSTKEY=[$2],
> C_NATIONKEY=[$3], N_NATIONKEY0=[$4], N_REGIONKEY0=[$5], R_REGIONKEY0=[$6]) :
> rowType = RecordType(ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY
> C_NATIONKEY, ANY N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount
> = 1500000.0, cumulative cost = {6450085.0 rows, 4.6800595E7 cpu, 3300055.0
> io, 1.333248E11 network, 2640528.0 memory}, id = 515362
> 01-06 BroadcastExchange : rowType = RecordType(ANY
> O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY N_NATIONKEY,
> ANY N_REGIONKEY, ANY R_REGIONKEY): rowcount = 1500000.0, cumulative cost =
> {4950085.0 rows, 3.6300595E7 cpu, 3300055.0 io, 1.333248E11 network,
> 2640528.0 memory}, id = 515361
> 04-01 HashJoin(condition=[=($2, $0)], joinType=[inner]) :
> rowType = RecordType(ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY
> C_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY): rowcount =
> 1500000.0, cumulative cost = {3450085.0 rows, 2.4300595E7 cpu, 3300055.0 io,
> 0.0 network, 2640528.0 memory}, id = 515360
> 04-03 Scan(table=[[si, tpch_sf1_parquet, orders]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/orders]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/orders, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`O_CUSTKEY`,
> `O_ORDERKEY`]]]) : rowType = RecordType(ANY O_CUSTKEY, ANY O_ORDERKEY):
> rowcount = 1500000.0, cumulative cost = {1500000.0 rows, 3000000.0 cpu,
> 3000000.0 io, 0.0 network, 0.0 memory}, id = 515354
> 04-02 HashJoin(condition=[=($1, $2)], joinType=[inner])
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY N_NATIONKEY, ANY
> N_REGIONKEY, ANY R_REGIONKEY): rowcount = 150000.0, cumulative cost =
> {300085.0 rows, 2100595.0 cpu, 300055.0 io, 0.0 network, 528.0 memory}, id =
> 515359
> 04-05 Scan(table=[[si, tpch_sf1_parquet, customer]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/customer]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/customer, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`C_CUSTKEY`,
> `C_NATIONKEY`]]]) : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY):
> rowcount = 150000.0, cumulative cost = {150000.0 rows, 300000.0 cpu, 300000.0
> io, 0.0 network, 0.0 memory}, id = 515355
> 04-04 HashJoin(condition=[=($1, $2)],
> joinType=[inner]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY,
> ANY R_REGIONKEY): rowcount = 25.0, cumulative cost = {60.0 rows, 395.0 cpu,
> 55.0 io, 0.0 network, 88.0 memory}, id = 515358
> 04-07 Scan(table=[[si, tpch_sf1_parquet, nation]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/nation]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/nation, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`N_NATIONKEY`,
> `N_REGIONKEY`]]]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY):
> rowcount = 25.0, cumulative cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0
> network, 0.0 memory}, id = 515356
> 04-06 Scan(table=[[si, tpch_sf1_parquet, region]],
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/region]],
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/region, numFiles=1,
> numRowGroups=1, usedMetadataFile=false, columns=[`R_REGIONKEY`]]]) : rowType
> = RecordType(ANY R_REGIONKEY): rowcount = 5.0, cumulative cost = {5.0 rows,
> 5.0 cpu, 5.0 io, 0.0 network, 0.0 memory}, id = 515357
> {code}
> Error
> {code}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 3:0
> [Error Id: 69c42333-5654-4d57-a14b-b1164db7acbd on sidrill3:31010]
> (java.lang.NullPointerException) null
>
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize():298
>
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():738
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():431
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>
> org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext():95
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1633
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)