Prasanth J created HIVE-5369:
--------------------------------
Summary: Annotate hive operator tree with statistics from metastore
Key: HIVE-5369
URL: https://issues.apache.org/jira/browse/HIVE-5369
Project: Hive
Issue Type: New Feature
Components: Query Processor, Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Fix For: 0.13.0
Currently the statistics gathered at table/partition level and column level are
not used during query planning stage. Statistics at table/partition and column
level can be used for optimizing the query plans. Basic statistics like
uncompressed data size can be used for better reducer estimation. Other
statistics like number of rows, distinct values of columns, average length of
columns etc. can be used by Cost Based Optimizer (CBO) for making better query
plan selection. As a first step in improving query planning the statistics that
are available in the metastore should be attached to hive operator tree. The
operator tree should be walked and annotated with statistics information. The
attached statistics will vary for each operator depending on the operation it
performs. For example, select operator will change the average row size but
doesn't affect the number of rows. Similarly filter operator will change the
number of rows but doesn't change the average row size. Similar rules can be
applied for other operators as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira