Adriano created HIVE-27680:
------------------------------
Summary: Statistics observation and new features
Key: HIVE-27680
URL: https://issues.apache.org/jira/browse/HIVE-27680
Project: Hive
Issue Type: New Feature
Reporter: Adriano
Dear Team,
Currently, Hive statistics are consistently computed for all tables, leaving us
with only the option to prevent these stats from being calculated by setting
"hive.stats.autogather=false" either in the client session or globally. We lack
the capability to selectively delete statistics table/columns from specific
tables.
While these statistics are beneficial for the planner in many cases, there are
situations where enabling them for all tables can be suboptimal. I've observed
that, for some tables, the size of these statistics can grow significantly,
potentially leading to Out Of Memory (OOM) issues.
The Apache Hive documentation does not provide a straightforward method for
manually adding or updating statistics and column statistics. It would greatly
benefit us if the following features could be considered:
1- An option that can be configured in the TBLPROPERTIES to disable statistics
computation for specific tables, holding it active for the rest.
2- A command to drop statistics for tables/columns for specific tables.
3- The documentation lacks a clear procedure for manually adding table/column
statistics. Although it might be possible to determine this through trial and
error, it would be much more efficient to have a documented process and a
command available, similar to many other settings.
We kindly request that you evaluate this proposal and consider the possibility
of further development to incorporate these features into the roadmap. Doing so
would greatly improve and enhance both the product and its accompanying
documentation.
Thank you for your attention to this matter.
Sincerely,
Adriano
--
This message was sent by Atlassian Jira
(v8.20.10#820010)