[jira] [Commented] (DRILL-4256) Performance regression in hive planning

ASF GitHub Bot (JIRA) Mon, 18 Jan 2016 15:47:04 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105982#comment-15105982
 ]


ASF GitHub Bot commented on DRILL-4256:
---------------------------------------

GitHub user vkorukanti opened a pull request:

    https://github.com/apache/drill/pull/329

    DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherev…

    …er needed.
    
    Creating new instances of HiveConf() are very costly, we should avoid 
creating new ones as much as possible.
    Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin 
wherever we need the HiveConf.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vkorukanti/drill DRILL-4256

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #329
    
----
commit 3769dada12dafc7cd9209551e96184c968d19f73
Author: vkorukanti <[email protected]>
Date:   2016-01-11T23:01:02Z

    DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherever 
needed.
    
    Creating new instances of HiveConf() are very costly, we should avoid 
creating new ones as much as possible.
    Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin 
wherever we need the HiveConf.

----


> Performance regression in hive planning
> ---------------------------------------
>
>                 Key: DRILL-4256
>                 URL: https://issues.apache.org/jira/browse/DRILL-4256
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Hive, Query Planning & Optimization
>    Affects Versions: 1.5.0
>            Reporter: Rahul Challapalli
>         Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4256) Performance regression in hive planning

Reply via email to