[ https://issues.apache.org/jira/browse/IMPALA-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-3869. ----------------------------------- Resolution: Not A Bug Hi Ravi, sorry you never got a response here. Usually we're better at responding to these type of questions on the u...@impala.apache.org list. We've done a lot of work on Kudu perf since you reported this so hopefully things are faster now. > Perfromance down in KUDU as compare to HDFS > ------------------------------------------- > > Key: IMPALA-3869 > URL: https://issues.apache.org/jira/browse/IMPALA-3869 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Kudu_Impala > Reporter: Ravi sharma > Priority: Minor > Attachments: query3.txt > > > I am performing testing scenarios between IMPALA on HDFS vs IMPALA on KUDU > we have set of queries which are accessing number of fact tables and > dimension tables. > In one of the query we are trying to process 2 fact tables which are having > around 78 millions and 668 millions records. > While having data in IMPALA on HDFS, i was able to get query results in less > than 50 seconds. > But while having data in IMPALA on KUDU, even after trying number of > distributions/paritions, i have not been able to reduce query execution time > less than 125 seconds. > So i have some conerns here... > 1. In KUDU, what is the criteria of having number of cores/nodes in cluster > as per number of records to process...? > 2. In KUDU, is there any option of like distributed cache in IMPALA on KUDU > to improve my execution time...? > 3. Is there any other way to improve performance with having such huge data > load..? > i have attached the query for reference.. -- This message was sent by Atlassian JIRA (v6.4.14#64029)