Eric Wohlstadter commented on TEZ-2161:


I'm not planning to do an actual CRDT implementation for this ticket. I do 
think that would be really valuable (I have used Akka's CRDT in a previous 
project), but it's not needed for the use-case where counters are aggregated in 
the getAllCounters() flow:

i.e. DAG aggregates Vertices which aggregates Tasks which chooses 
"bestAttempt". And the whole thing runs in various locks. This getAllCounters() 
flow executes locally on the AM.
My plan is to add "aggregateAllCounters" to the CounterGroup classes, which 
will be used similarly to "incrAllCounters", except instead of only doing SUM, 
it also does MIN, AVG, MAX. 

Since MIN, AVG, MAX have very efficient incremental algorithms ("embarrassingly 
incremental"), the overhead should be negligible (but this will need to be 
profiled to verify my claim). 
Since I'm not doing CRDT, do you think we should rename this ticket or should I 
open a different ticket?

> Support CRDT aggregation models for counters 
> ---------------------------------------------
>                 Key: TEZ-2161
>                 URL: https://issues.apache.org/jira/browse/TEZ-2161
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Eric Wohlstadter
>            Priority: Major
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().

This message was sent by Atlassian JIRA

Reply via email to