[jira] [Updated] (HIVE-7589) Some fixes and improvements to statistics annotation rules

Prasanth J (JIRA) Thu, 31 Jul 2014 19:49:25 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prasanth J updated HIVE-7589:
-----------------------------

    Description: 
*FIXES:*
1) JOIN rule does not properly propagate the column statistics from its parent
2) Multi-way join rule computes the denominator for #rows estimation wrongly
3) GROUPBY rule does not account for the data size of aggregate column
4) Prefix removal from column names isn't working
5) GROUPBY rule looks at missing column statistics for aggregate column from 
its parent and assumes PARTIAL column stats state

*IMPROVEMENTS:*
1) Replace "EXPLAIN EXTENDED" with "EXPLAIN" in test cases to make the golden 
files easy to comprehend and to reduce verbosity
2) Introduce rule for ReduceSink operator which only does renaming of column 
statistics as per output row schema
3) Add more rows to the test datasets to avoid 0 row scenario in join test cases
4) JOIN rule improvement to avoid long overflow

  was:
*FIXES:*
1) JOIN rule does not properly propagate the column statistics from its parent
2) Multi-way join rule computes the denominator for #rows estimation wrongly
3) GROUPBY rule does not account for the data size of aggregate column
4) Prefix removal from column names wasn't working
5) GROUPBY rule looks at missing column statistics for aggregate column from 
its parent and assumes PARTIAL column stats state

*IMPROVEMENTS:*
1) Replaced "EXPLAIN EXTENDED" with "EXPLAIN" in test cases to make the golden 
files easy to comprehend and to reduce verbosity
2) Introduced rule for ReduceSink operator which only does renaming of column 
statistics as per output row schema
3) Added more rows to the test datasets to avoid 0 row scenario in join test 
cases
4) JOIN rule improvement to avoid long overflow


> Some fixes and improvements to statistics annotation rules
> ----------------------------------------------------------
>
>                 Key: HIVE-7589
>                 URL: https://issues.apache.org/jira/browse/HIVE-7589
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor, Statistics
>    Affects Versions: 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>             Fix For: 0.13.0
>
>
> *FIXES:*
> 1) JOIN rule does not properly propagate the column statistics from its parent
> 2) Multi-way join rule computes the denominator for #rows estimation wrongly
> 3) GROUPBY rule does not account for the data size of aggregate column
> 4) Prefix removal from column names isn't working
> 5) GROUPBY rule looks at missing column statistics for aggregate column from 
> its parent and assumes PARTIAL column stats state
> *IMPROVEMENTS:*
> 1) Replace "EXPLAIN EXTENDED" with "EXPLAIN" in test cases to make the golden 
> files easy to comprehend and to reduce verbosity
> 2) Introduce rule for ReduceSink operator which only does renaming of column 
> statistics as per output row schema
> 3) Add more rows to the test datasets to avoid 0 row scenario in join test 
> cases
> 4) JOIN rule improvement to avoid long overflow



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7589) Some fixes and improvements to statistics annotation rules

Reply via email to