[ 
https://issues.apache.org/jira/browse/HIVE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837038#comment-13837038
 ] 

Ashutosh Chauhan edited comment on HIVE-5916 at 12/2/13 11:04 PM:
------------------------------------------------------------------

Currently from TableScanOp we publish statistics to JobTracker with aggrKey as 
counter group name, actual statistics type (numRows etc) as counter and value 
of statistics as counter value. We can simply use partition spec as counter 
group name, statistics as counter name and then we need not to do invoke stats 
aggregation from hive client when query finishes. This has following advantages:
* Client don't need to do any aggregation. After retrieving statistics from 
JobTracker (via JobClient) it can directly add them to metastore.
* It lowers memory footprint on JobTracker, since instead of having counters 
per task per partition, it will have counters per partition only.


was (Author: ashutoshc):
Currently from TableScanOp we publish statistics to JobTracker with aggrKey as 
counter group name, actual statistics type (numRows etc) as counter and value 
of statistics as counter value. We can simply use statistics type both as 
counter group name as well as counter name and then we need not to do invoke 
stats aggregation from hive client when query finishes. This has following 
advantages:
* Client don't need to do any aggregation. After retrieving statistics from 
JobTracker (via JobClient) it can directly add them to metastore.
* It lowers memory footprint on JobTracker, since instead of having counters 
per task per partition, it will have counters per partition only.

> No need to aggregate statistics collected via counter mechanism 
> ----------------------------------------------------------------
>
>                 Key: HIVE-5916
>                 URL: https://issues.apache.org/jira/browse/HIVE-5916
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.13.0
>            Reporter: Ashutosh Chauhan
>
> This results in unnecessary computations and waste of cluster resources which 
> is not required since aggregation of counter is anyway done by JobTracker.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to