[ 
https://issues.apache.org/jira/browse/DRILL-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165444#comment-14165444
 ] 

Aman Sinha commented on DRILL-1507:
-----------------------------------

I am looking into this but want to note that the message you see in the log 
regarding retrying the hash table insertion is actually an information message, 
not a real error.  However, I do want to determine if we are doing this 
excessively. 

For comparison purposes, can you please run a query with plain aggregation (no 
group-by) on both json and parquet and post the timings:  
   select min(ss_quantity) from store_sales;
This will not do hash aggregate and I want to compare the timings without that 
in the picture.

> Potential hash insert issue 
> ----------------------------
>
>                 Key: DRILL-1507
>                 URL: https://issues.apache.org/jira/browse/DRILL-1507
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.6.0
>            Reporter: Chun Chang
>
> #Thu Oct 02 17:49:48 PDT 2014
> git.commit.id.abbrev=29dde76
> Running the following "case, group by, and order by" query against json file 
> type, I saw the following hash insert errors repeatedly. The query finishes 
> eventually after a little over 30 min, and the data returned is correct. The 
> same query running against parquet file finishes in about a minute. Here is 
> the query:
> /root/drillATS/incubator-drill/testing/framework/resources/aggregate1/json/testcases/aggregate26.q
>  :
> select cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end 
> as int) as soldd, cast(case when ss_sold_time_sk is null then 0 else 
> ss_sold_time_sk end as bigint) as soldt, cast(case when ss_item_sk is null 
> then 0.0 else ss_item_sk end as float) as itemsk, cast(case when 
> ss_customer_sk is null then 0.0 else ss_customer_sk end as decimal(18,9)) as 
> custsk, cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as 
> varchar(20)) as cdemo, ss_hdemo_sk as hdemo, ss_addr_sk as addrsk, 
> ss_store_sk as storesk, ss_promo_sk as promo, ss_ticket_number as tickn, 
> sum(ss_quantity) as quantities from store_sales group by cast(case when 
> ss_sold_date_sk is null then 0 else ss_sold_date_sk end as int), cast(case 
> when ss_sold_time_sk is null then 0 else ss_sold_time_sk end as bigint), 
> cast(case when ss_item_sk is null then 0.0 else ss_item_sk end as float), 
> cast(case when ss_customer_sk is null then 0.0 else ss_customer_sk end as 
> decimal(18,9)), cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk 
> end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, 
> ss_ticket_number order by cast(case when ss_sold_date_sk is null then 0 else 
> ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0 
> else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then 
> 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null then 
> 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is 
> null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, 
> ss_store_sk, ss_promo_sk, ss_ticket_number limit 100
> Here is the error I saw:
> 11:46:46.836 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer 
> Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 
> 32768 bytes. Total Allocated: 778240
> 11:46:46.848 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG 
> o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with 
> new batch holder...
> .....
> 11:48:49.936 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer 
> Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 
> 32768 bytes. Total Allocated: 778240
> 11:48:49.947 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG 
> o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with 
> new batch holder...
> The data is tpcds and converted into json using drill's json writer. Since 
> eventually the query completes and passes data verification, the json writer 
> is probably converting parquet to json correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to