[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397546#comment-16397546
 ] 

Gopal V edited comment on TEZ-2161 at 3/13/18 7:50 PM:
-------------------------------------------------------

bq. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". 
And the whole thing runs in various locks. This getAllCounters() flow executes 
locally on the AM.

CRDT was more for the P-N counter implementation for aggregates which store 
both the -ve and +ve movements of the counter.

This is useful for things like CPU time where the single counter can hold both 
the "wasted CPU" and the "spent CPU" in the same structure.

bq. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which 
will be used similarly to "incrAllCounters", except instead of only doing SUM, 
it also does MIN, AVG, MAX.

The Counter needs sub-classes which declare what it needs to aggregate on - 
adding fields to every counter will break everything downstream that exists 
today.

Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than 
messing with the existing GC_MILLIS counter.


was (Author: gopalv):
bq. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". 
And the whole thing runs in various locks. This getAllCounters() flow executes 
locally on the AM.

CRDT was more for the P-N counter implementation for aggregates which store 
both the -ve and +ve movements of the counter.

bq. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which 
will be used similarly to "incrAllCounters", except instead of only doing SUM, 
it also does MIN, AVG, MAX.

The Counter needs sub-classes which declare what it needs to aggregate on - 
adding fields to every counter will break everything downstream that exists 
today.

Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than 
messing with the existing GC_MILLIS counter.

> Support CRDT aggregation models for counters 
> ---------------------------------------------
>
>                 Key: TEZ-2161
>                 URL: https://issues.apache.org/jira/browse/TEZ-2161
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Eric Wohlstadter
>            Priority: Major
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to