[ 
https://issues.apache.org/jira/browse/IMPALA-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3869.
-----------------------------------
    Resolution: Not A Bug

Hi Ravi, sorry you never got a response here. Usually we're better at 
responding to these type of questions on the u...@impala.apache.org list. We've 
done a lot of work on Kudu perf since you reported this so hopefully things are 
faster now.

> Perfromance down in KUDU as compare to HDFS
> -------------------------------------------
>
>                 Key: IMPALA-3869
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3869
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Kudu_Impala
>            Reporter: Ravi sharma
>            Priority: Minor
>         Attachments: query3.txt
>
>
> I am performing testing scenarios between IMPALA on HDFS vs IMPALA on KUDU
> we have set of queries which are accessing number of fact tables and 
> dimension tables.
> In one of the query we are trying to process 2 fact tables which are having 
> around 78 millions and 668 millions records.
> While having data in IMPALA on HDFS, i was able to get query results in less 
> than 50 seconds.
> But while having data in IMPALA on KUDU, even after trying number of 
> distributions/paritions, i have not been able to reduce query execution time 
> less than 125 seconds.
> So i have some conerns here...
> 1. In KUDU, what is the criteria of having number of cores/nodes in cluster 
> as per number of records to process...?
> 2. In KUDU, is there any option of like distributed cache in IMPALA on KUDU 
> to improve my execution time...?
> 3. Is there any other way to improve performance with having such huge data 
> load..?
> i have attached the query for reference..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to