[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148098#comment-17148098
 ] 

Peter Vary commented on HIVE-23776:
-----------------------------------

[~prasanth_j]: I have been analyzing the ACID update queries execution time on 
S3 with simple, 1 row updates. The flamegraph for the HS2 side shows that 1/4 
of the time there is spent on stats generation, specifically on listing of the 
files and directories.
Thanks, Peter 

> Retire quickstats autocollection
> --------------------------------
>
>                 Key: HIVE-23776
>                 URL: https://issues.apache.org/jira/browse/HIVE-23776
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to