[ https://issues.apache.org/jira/browse/DRILL-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165444#comment-14165444 ]
Aman Sinha commented on DRILL-1507: ----------------------------------- I am looking into this but want to note that the message you see in the log regarding retrying the hash table insertion is actually an information message, not a real error. However, I do want to determine if we are doing this excessively. For comparison purposes, can you please run a query with plain aggregation (no group-by) on both json and parquet and post the timings: select min(ss_quantity) from store_sales; This will not do hash aggregate and I want to compare the timings without that in the picture. > Potential hash insert issue > ---------------------------- > > Key: DRILL-1507 > URL: https://issues.apache.org/jira/browse/DRILL-1507 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 0.6.0 > Reporter: Chun Chang > > #Thu Oct 02 17:49:48 PDT 2014 > git.commit.id.abbrev=29dde76 > Running the following "case, group by, and order by" query against json file > type, I saw the following hash insert errors repeatedly. The query finishes > eventually after a little over 30 min, and the data returned is correct. The > same query running against parquet file finishes in about a minute. Here is > the query: > /root/drillATS/incubator-drill/testing/framework/resources/aggregate1/json/testcases/aggregate26.q > : > select cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end > as int) as soldd, cast(case when ss_sold_time_sk is null then 0 else > ss_sold_time_sk end as bigint) as soldt, cast(case when ss_item_sk is null > then 0.0 else ss_item_sk end as float) as itemsk, cast(case when > ss_customer_sk is null then 0.0 else ss_customer_sk end as decimal(18,9)) as > custsk, cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as > varchar(20)) as cdemo, ss_hdemo_sk as hdemo, ss_addr_sk as addrsk, > ss_store_sk as storesk, ss_promo_sk as promo, ss_ticket_number as tickn, > sum(ss_quantity) as quantities from store_sales group by cast(case when > ss_sold_date_sk is null then 0 else ss_sold_date_sk end as int), cast(case > when ss_sold_time_sk is null then 0 else ss_sold_time_sk end as bigint), > cast(case when ss_item_sk is null then 0.0 else ss_item_sk end as float), > cast(case when ss_customer_sk is null then 0.0 else ss_customer_sk end as > decimal(18,9)), cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk > end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, > ss_ticket_number order by cast(case when ss_sold_date_sk is null then 0 else > ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0 > else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then > 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null then > 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is > null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, > ss_store_sk, ss_promo_sk, ss_ticket_number limit 100 > Here is the error I saw: > 11:46:46.836 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer > Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved > 32768 bytes. Total Allocated: 778240 > 11:46:46.848 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG > o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with > new batch holder... > ..... > 11:48:49.936 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer > Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved > 32768 bytes. Total Allocated: 778240 > 11:48:49.947 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG > o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with > new batch holder... > The data is tpcds and converted into json using drill's json writer. Since > eventually the query completes and passes data verification, the json writer > is probably converting parquet to json correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)