Osama Suleiman created IMPALA-6500:
--------------------------------------

             Summary: Impala crashes randomly on different queries with GROUP BY
                 Key: IMPALA-6500
                 URL: https://issues.apache.org/jira/browse/IMPALA-6500
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.10.0
         Environment: RHEL 6.5 (Santiago), Kernel version: 
2.6.32-431.el6.x86_64, CDH 5.13.1 single node, Impala 2.10
            Reporter: Osama Suleiman
         Attachments: hs_err_pid9910.log

I have a Parquet table created by Hive and I am doing multiple different 
queries on it, such as:

SELECT product_category, 
SUM(cast(profit AS DECIMAL(15,2))) as total_profit,
SUM(cast(sales AS DECIMAL(15,2))) as total_sales
FROM copy_orders
GROUP BY product_category;

and:

SELECT customer_name, 
SUM(cast(profit AS DECIMAL(15,2))) as total_profit,
SUM(cast(sales AS DECIMAL(15,2))) as total_sales
FROM copy_orders
GROUP BY customer_name
ORDER BY total_profit DESC
LIMIT 10;

These two queries tend to run successfully in some rare occasions, most of the 
time running those queries on HUE's Impala query editor will return:

??Could not connect to hostname:21050 (code THRIFTTRANSPORT): 
TTransportException('Could not connect to hostname:21050',)??

Simultaneously, the Impala Daemon crashes according to the Cloudera Manager and 
then it will work again approximately 1 min later. Meanwhile, You can run other 
simple queries and it will run successfully.

I have attached a log file for a sample run of one of the queries since they 
all generate relevant logs. I have tried to use ??SET disable_codegen=1 ??but 
the problem resumed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to