[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148054#comment-17148054
 ] 

Prasanth Jayachandran commented on HIVE-23776:
----------------------------------------------

Yes. I know the quickstats part. The workload management triggers can define 
*any* hive counters that includes the following counters newly added.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CompileTimeCounters.java]
 

If text files land in some staging table and if there are workload management 
trigger/guardrails that says "if query scans > 10TB kill query" then removing 
these quick stats will break the functionality. These staging tables are not 
going to get analyzed in some cases for it to collect statistics. 

Just searching the hive code base, unit testing will alone not be sufficient to 
know if customers are using it or not. If there is a specific need to remove 
this put it behind a config, deprecate and remove in iterations before removing 
it in one go. 

 

> Retire quickstats autocollection
> --------------------------------
>
>                 Key: HIVE-23776
>                 URL: https://issues.apache.org/jira/browse/HIVE-23776
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to