[ 
https://issues.apache.org/jira/browse/DRILL-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593023#comment-16593023
 ] 

ASF GitHub Bot commented on DRILL-6706:
---------------------------------------

sachouche commented on issue #1445: DRILL-6706: fixed null pointer exception in 
HashJoin
URL: https://github.com/apache/drill/pull/1445#issuecomment-416070314
 
 
   Modified the fix to avoid regression due to DRILL-4264:
   - This should be a temporary fix up to @vvysotskyi  clarifies and improves 
his DRILL-4264 fix
   - Instead of quoting not found columns, we should improve the metadata so 
that operators could handle such use-cases explicitly
   - Not also this behavior is based on readers; it should apply to all readers 
(not just Parquet)
   
   Summary - the current fix keeps the Parquet reader behavior while fixing the 
NullPointerException. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query with 10-way hash join fails with NullPointerException
> -----------------------------------------------------------
>
>                 Key: DRILL-6706
>                 URL: https://issues.apache.org/jira/browse/DRILL-6706
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators, Query Planning & 
> Optimization
>    Affects Versions: 1.15.0
>            Reporter: Abhishek Girish
>            Assignee: salim achouche
>            Priority: Critical
>         Attachments: drillbit.log.zip
>
>
> {code}
> SELECT   C.C_CUSTKEY  AS C_CUSTKEY
> FROM si.tpch_sf1_parquet.customer C,
>          si.tpch_sf1_parquet.orders O,
>          si.tpch_sf1_parquet.lineitem L,
>          si.tpch_sf1_parquet.part P,
>          si.tpch_sf1_parquet.supplier S,
>          si.tpch_sf1_parquet.partsupp PS,
>          si.tpch_sf1_parquet.nation S_N,
>          si.tpch_sf1_parquet.region S_R,
>          si.tpch_sf1_parquet.nation C_N,
>          si.tpch_sf1_parquet.region C_R
> WHERE    C.C_CUSTKEY = O.O_CUSTKEY 
> AND      O.O_ORDERKEY = L.L_ORDERKEY
> AND      L.L_PARTKEY = P.P_PARTKEY
> AND      L.L_SUPPKEY = S.S_SUPPKEY
> AND      P.P_PARTKEY = PS.PS_PARTKEY
> AND      P.P_SUPPKEY = PS.PS_SUPPKEY
> AND      S.S_NATIONKEY = S_N.N_NATIONKEY
> AND      S_N.N_REGIONKEY = S_R.R_REGIONKEY
> AND      C.C_NATIONKEY = C_N.N_NATIONKEY
> AND      C_N.N_REGIONKEY = C_R.R_REGIONKEY
> {code}
> Plan
> {code}
> 00-00    Screen : rowType = RecordType(ANY C_CUSTKEY): rowcount = 6001215.0, 
> cumulative cost = {6.02000115E7 rows, 5.049839315E8 cpu, 2.3323755E7 io, 
> 1.9917297664E11 network, 4.8577056E7 memory}, id = 515368
> 00-01      Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {5.959989E7 rows, 5.0438381E8 cpu, 
> 2.3323755E7 io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515367
> 00-02        UnionExchange : rowType = RecordType(ANY C_CUSTKEY): rowcount = 
> 6001215.0, cumulative cost = {5.3598675E7 rows, 4.98382595E8 cpu, 2.3323755E7 
> io, 1.9917297664E11 network, 4.8577056E7 memory}, id = 515366
> 01-01          Project(C_CUSTKEY=[$0]) : rowType = RecordType(ANY C_CUSTKEY): 
> rowcount = 6001215.0, cumulative cost = {4.759746E7 rows, 4.50372875E8 cpu, 
> 2.3323755E7 io, 1.74592E11 network, 4.8577056E7 memory}, id = 515365
> 01-02            Project(C_CUSTKEY=[$14], C_NATIONKEY=[$15], O_CUSTKEY=[$12], 
> O_ORDERKEY=[$13], L_ORDERKEY=[$0], L_PARTKEY=[$1], L_SUPPKEY=[$2], 
> P_PARTKEY=[$10], P_SUPPKEY=[$11], S_SUPPKEY=[$3], S_NATIONKEY=[$4], 
> PS_PARTKEY=[$8], PS_SUPPKEY=[$9], N_NATIONKEY=[$5], N_REGIONKEY=[$6], 
> R_REGIONKEY=[$7], N_NATIONKEY0=[$16], N_REGIONKEY0=[$17], R_REGIONKEY0=[$18]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY O_CUSTKEY, ANY 
> O_ORDERKEY, ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY P_PARTKEY, ANY 
> P_SUPPKEY, ANY S_SUPPKEY, ANY S_NATIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, 
> ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY, ANY N_NATIONKEY0, ANY 
> N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, cumulative cost = 
> {4.1596245E7 rows, 4.4437166E8 cpu, 2.3323755E7 io, 1.74592E11 network, 
> 4.8577056E7 memory}, id = 515364
> 01-03              HashJoin(condition=[=($13, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY, 
> ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY 
> N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount = 6001215.0, 
> cumulative cost = {3.559503E7 rows, 3.30348575E8 cpu, 2.3323755E7 io, 
> 1.74592E11 network, 4.8577056E7 memory}, id = 515363
> 01-05                HashJoin(condition=[=($1, $10)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY, ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): 
> rowcount = 6001215.0, cumulative cost = {2.164373E7 rows, 1.995334E8 cpu, 
> 2.00237E7 io, 4.12672E10 network, 1.9536528E7 memory}, id = 515353
> 01-08                  HashJoin(condition=[=($2, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY L_SUPPKEY, ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 6001215.0, cumulative cost = {1.2042515E7 rows, 
> 9.031882E7 cpu, 1.80237E7 io, 6.3488E8 network, 176528.0 memory}, id = 515348
> 01-10                    Scan(table=[[si, tpch_sf1_parquet, lineitem]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/lineitem]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/lineitem, numFiles=1, 
> numRowGroups=3, usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_PARTKEY`, 
> `L_SUPPKEY`]]]) : rowType = RecordType(ANY L_ORDERKEY, ANY L_PARTKEY, ANY 
> L_SUPPKEY): rowcount = 6001215.0, cumulative cost = {6001215.0 rows, 
> 1.8003645E7 cpu, 1.8003645E7 io, 0.0 network, 0.0 memory}, id = 515341
> 01-09                    BroadcastExchange : rowType = RecordType(ANY 
> S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY 
> R_REGIONKEY): rowcount = 10000.0, cumulative cost = {30085.0 rows, 220595.0 
> cpu, 20055.0 io, 6.3488E8 network, 528.0 memory}, id = 515347
> 02-01                      HashJoin(condition=[=($1, $2)], joinType=[inner]) 
> : rowType = RecordType(ANY S_SUPPKEY, ANY S_NATIONKEY, ANY N_NATIONKEY, ANY 
> N_REGIONKEY, ANY R_REGIONKEY): rowcount = 10000.0, cumulative cost = {20085.0 
> rows, 140595.0 cpu, 20055.0 io, 0.0 network, 528.0 memory}, id = 515346
> 02-03                        Scan(table=[[si, tpch_sf1_parquet, supplier]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/supplier]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/supplier, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`S_SUPPKEY`, 
> `S_NATIONKEY`]]]) : rowType = RecordType(ANY S_SUPPKEY, ANY S_NATIONKEY): 
> rowcount = 10000.0, cumulative cost = {10000.0 rows, 20000.0 cpu, 20000.0 io, 
> 0.0 network, 0.0 memory}, id = 515342
> 02-02                        HashJoin(condition=[=($1, $2)], 
> joinType=[inner]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY, 
> ANY R_REGIONKEY): rowcount = 25.0, cumulative cost = {60.0 rows, 395.0 cpu, 
> 55.0 io, 0.0 network, 88.0 memory}, id = 515345
> 02-05                          Scan(table=[[si, tpch_sf1_parquet, nation]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/nation]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/nation, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`N_NATIONKEY`, 
> `N_REGIONKEY`]]]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY): 
> rowcount = 25.0, cumulative cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0 
> network, 0.0 memory}, id = 515343
> 02-04                          Scan(table=[[si, tpch_sf1_parquet, region]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/region]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/region, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`R_REGIONKEY`]]]) : rowType 
> = RecordType(ANY R_REGIONKEY): rowcount = 5.0, cumulative cost = {5.0 rows, 
> 5.0 cpu, 5.0 io, 0.0 network, 0.0 memory}, id = 515344
> 01-07                  BroadcastExchange : rowType = RecordType(ANY 
> PS_PARTKEY, ANY PS_SUPPKEY, ANY P_PARTKEY, ANY P_SUPPKEY): rowcount = 
> 800000.0, cumulative cost = {2800000.0 rows, 3.08E7 cpu, 2000000.0 io, 
> 4.063232E10 network, 5280000.0 memory}, id = 515352
> 03-01                    HashJoin(condition=[AND(=($2, $0), =($3, $1))], 
> joinType=[inner]) : rowType = RecordType(ANY PS_PARTKEY, ANY PS_SUPPKEY, ANY 
> P_PARTKEY, ANY P_SUPPKEY): rowcount = 800000.0, cumulative cost = {2000000.0 
> rows, 2.44E7 cpu, 2000000.0 io, 0.0 network, 5280000.0 memory}, id = 515351
> 03-03                      Scan(table=[[si, tpch_sf1_parquet, partsupp]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/partsupp]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/partsupp, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`PS_PARTKEY`, 
> `PS_SUPPKEY`]]]) : rowType = RecordType(ANY PS_PARTKEY, ANY PS_SUPPKEY): 
> rowcount = 800000.0, cumulative cost = {800000.0 rows, 1600000.0 cpu, 
> 1600000.0 io, 0.0 network, 0.0 memory}, id = 515349
> 03-02                      Scan(table=[[si, tpch_sf1_parquet, part]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/part]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/part, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`P_PARTKEY`, `P_SUPPKEY`]]]) 
> : rowType = RecordType(ANY P_PARTKEY, ANY P_SUPPKEY): rowcount = 200000.0, 
> cumulative cost = {200000.0 rows, 400000.0 cpu, 400000.0 io, 0.0 network, 0.0 
> memory}, id = 515350
> 01-04                Project(O_CUSTKEY=[$0], O_ORDERKEY=[$1], C_CUSTKEY=[$2], 
> C_NATIONKEY=[$3], N_NATIONKEY0=[$4], N_REGIONKEY0=[$5], R_REGIONKEY0=[$6]) : 
> rowType = RecordType(ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY 
> C_NATIONKEY, ANY N_NATIONKEY0, ANY N_REGIONKEY0, ANY R_REGIONKEY0): rowcount 
> = 1500000.0, cumulative cost = {6450085.0 rows, 4.6800595E7 cpu, 3300055.0 
> io, 1.333248E11 network, 2640528.0 memory}, id = 515362
> 01-06                  BroadcastExchange : rowType = RecordType(ANY 
> O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY C_NATIONKEY, ANY N_NATIONKEY, 
> ANY N_REGIONKEY, ANY R_REGIONKEY): rowcount = 1500000.0, cumulative cost = 
> {4950085.0 rows, 3.6300595E7 cpu, 3300055.0 io, 1.333248E11 network, 
> 2640528.0 memory}, id = 515361
> 04-01                    HashJoin(condition=[=($2, $0)], joinType=[inner]) : 
> rowType = RecordType(ANY O_CUSTKEY, ANY O_ORDERKEY, ANY C_CUSTKEY, ANY 
> C_NATIONKEY, ANY N_NATIONKEY, ANY N_REGIONKEY, ANY R_REGIONKEY): rowcount = 
> 1500000.0, cumulative cost = {3450085.0 rows, 2.4300595E7 cpu, 3300055.0 io, 
> 0.0 network, 2640528.0 memory}, id = 515360
> 04-03                      Scan(table=[[si, tpch_sf1_parquet, orders]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/orders]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/orders, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`O_CUSTKEY`, 
> `O_ORDERKEY`]]]) : rowType = RecordType(ANY O_CUSTKEY, ANY O_ORDERKEY): 
> rowcount = 1500000.0, cumulative cost = {1500000.0 rows, 3000000.0 cpu, 
> 3000000.0 io, 0.0 network, 0.0 memory}, id = 515354
> 04-02                      HashJoin(condition=[=($1, $2)], joinType=[inner]) 
> : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY, ANY N_NATIONKEY, ANY 
> N_REGIONKEY, ANY R_REGIONKEY): rowcount = 150000.0, cumulative cost = 
> {300085.0 rows, 2100595.0 cpu, 300055.0 io, 0.0 network, 528.0 memory}, id = 
> 515359
> 04-05                        Scan(table=[[si, tpch_sf1_parquet, customer]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/customer]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/customer, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`C_CUSTKEY`, 
> `C_NATIONKEY`]]]) : rowType = RecordType(ANY C_CUSTKEY, ANY C_NATIONKEY): 
> rowcount = 150000.0, cumulative cost = {150000.0 rows, 300000.0 cpu, 300000.0 
> io, 0.0 network, 0.0 memory}, id = 515355
> 04-04                        HashJoin(condition=[=($1, $2)], 
> joinType=[inner]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY, 
> ANY R_REGIONKEY): rowcount = 25.0, cumulative cost = {60.0 rows, 395.0 cpu, 
> 55.0 io, 0.0 network, 88.0 memory}, id = 515358
> 04-07                          Scan(table=[[si, tpch_sf1_parquet, nation]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/nation]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/nation, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`N_NATIONKEY`, 
> `N_REGIONKEY`]]]) : rowType = RecordType(ANY N_NATIONKEY, ANY N_REGIONKEY): 
> rowcount = 25.0, cumulative cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0 
> network, 0.0 memory}, id = 515356
> 04-06                          Scan(table=[[si, tpch_sf1_parquet, region]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpch/sf1/parquet/region]], 
> selectionRoot=maprfs:/drill/testdata/tpch/sf1/parquet/region, numFiles=1, 
> numRowGroups=1, usedMetadataFile=false, columns=[`R_REGIONKEY`]]]) : rowType 
> = RecordType(ANY R_REGIONKEY): rowcount = 5.0, cumulative cost = {5.0 rows, 
> 5.0 cpu, 5.0 io, 0.0 network, 0.0 memory}, id = 515357
> {code}
> Error
> {code}
> Error: SYSTEM ERROR: NullPointerException
> Fragment 3:0
> [Error Id: 69c42333-5654-4d57-a14b-b1164db7acbd on sidrill3:31010]
>   (java.lang.NullPointerException) null
>     
> org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$BuildSidePartitioningImpl.initialize():298
>     
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():738
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():431
>     org.apache.drill.exec.record.AbstractRecordBatch.next():172
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>     
> org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext():95
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():93
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1633
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to