Peter Vary commented on HIVE-23776:

[~prasanth_j]: I have been analyzing the ACID update queries execution time on 
S3 with simple, 1 row updates. The flamegraph for the HS2 side shows that 1/4 
of the time there is spent on stats generation, specifically on listing of the 
files and directories.
Thanks, Peter 

> Retire quickstats autocollection
> --------------------------------
>                 Key: HIVE-23776
>                 URL: https://issues.apache.org/jira/browse/HIVE-23776
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes

This message was sent by Atlassian Jira

Reply via email to