[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

Pengcheng Xiong (JIRA) Wed, 12 Oct 2016 10:37:45 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569355#comment-15569355
 ]


Pengcheng Xiong commented on HIVE-14803:
----------------------------------------

Thanks [~sseth] for digging this out. [~rajesh.balamohan], it seems that we 
really have some problem in this patch. It looks like the stats are missing. In 
the explain plan, if the row of src table is 29 rather than 500, that usually 
means stats are missing. Could u take another look and upload a new patch? And, 
there is also a problem of the thread pool. People may set the 
mv.files.thread=0. In that case, threadpool will be null. Thanks.

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-14803
>                 URL: https://issues.apache.org/jira/browse/HIVE-14803
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.1.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-14803.1.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

Reply via email to