Zoltan Haindrich created HIVE-23776:
---------------------------------------
Summary: Retire quickstats autocollection
Key: HIVE-23776
URL: https://issues.apache.org/jira/browse/HIVE-23776
Project: Hive
Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich
this is about:
* num files
* datasize (sum of filesizes)
* num erasure coded files
right now these are scanned during every BasicStatsTask execution - which means
some filesystem reads/etc - for small inserts these are visible in case the fs
is a bit slower (s3 and friends)
I don't think they are really in use...we rely more on columnstats which are
more accurate ; and because of the datasize in this case is for "offline"
(ondisk) - while we should be insted calculate with "online" sizes...
proposal:
* remove collection and storage of this data
* collect it on the fly during "desc formatted" statements to provide them for
informational purposes
--
This message was sent by Atlassian Jira
(v8.3.4#803005)