[
https://issues.apache.org/jira/browse/DRILL-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551497#comment-16551497
]
salim achouche commented on DRILL-6622:
---------------------------------------
This looks like a serious bug:
* The batch memory managers are somehow thinking that most incoming batches
are empty
* The aggregator used to create outgoing batches with exactly 2**16 max
capacity
* The memory manager erroneous stats make it so the Aggregator is getting a
max capacity of 1
* This meant that every unique group is being stored in its own outgoing batch
* The Aggregator limits the max number of outgoing batches to 64k (since
previously a batch could contain 64k entries); a 32bits indexing scheme
subdivides this space into a couple (out-batch-idx, idx-within-batch)
* A NullpointException happens when this indexing scheme fails becomes of the
large number of outgoing batches (overflow)
* The bug was there for awhile (when Aggregator was modified to support batch
sizing) but the bug manifested itself only on a large number of unique groups
I am having now to reverse engineer the reason for the erroneous batch sizer
stats.
> UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
> -------------------------------------------------------------------
>
> Key: DRILL-6622
> URL: https://issues.apache.org/jira/browse/DRILL-6622
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Codegen
> Affects Versions: 1.14.0
> Reporter: Vitalii Diravka
> Assignee: salim achouche
> Priority: Blocker
> Fix For: 1.14.0
>
> Attachments:
> MD4208_id_05_1_id_24b2a6f9-ed66-b97e-594d-f116cd3fdd23.json,
> MD4208_id_05_3_id_24b2ad9c-4568-a476-bbf6-2e17441078b1.json
>
>
> {code}
> SELECT c_customer_id FROM customer
> UNION
> SELECT ca_address_id FROM customer_address
> UNION
> SELECT cd_credit_rating FROM customer_demographics
> UNION
> SELECT hd_buy_potential FROM household_demographics
> UNION
> SELECT i_item_id FROM item
> UNION
> SELECT p_promo_id FROM promotion
> UNION
> SELECT t_time_id FROM time_dim
> UNION
> SELECT d_date_id FROM date_dim
> UNION
> SELECT s_store_id FROM store
> UNION
> SELECT w_warehouse_id FROM warehouse
> UNION
> SELECT sm_ship_mode_id FROM ship_mode
> UNION
> SELECT r_reason_id FROM reason
> UNION
> SELECT cc_call_center_id FROM call_center
> UNION
> SELECT web_site_id FROM web_site
> UNION
> SELECT wp_web_page_id FROM web_page
> UNION
> SELECT cp_catalog_page_id FROM catalog_page;
> {code}
> hit the following error:
> {code}
> Caused by: java.lang.NullPointerException: null
> at
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:96)
> ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.test.generated.HashTableGen3$BatchHolder.isKeyMatchInternalBuild(BatchHolder.java:171)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.isKeyMatch(HashTableTemplate.java:218)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1000(HashTableTemplate.java:120)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.put(HashTableTemplate.java:650)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.test.generated.HashAggregatorGen0.checkGroupAndAggrValues(HashAggTemplate.java:1372)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.HashAggregatorGen0.doWork(HashAggTemplate.java:599)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:268)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.union.UnionAllRecordBatch$UnionInputIterator.next(UnionAllRecordBatch.java:381)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> {code}
> [~dechanggu] found that the issue is absent in Drill 1.13.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)