[jira] [Updated] (TEZ-1167) Statistics infrastructure and API for Tez

Gopal V (JIRA) Mon, 09 Mar 2015 23:24:08 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gopal V updated TEZ-1167:
-------------------------
    Attachment: tez-reducer-skew.png

We need leaky bucket stats collection for some of the more expensive metrics. 
Let me pick an explicit example and deal with it

The slow-start in the presence of skews tend to show the reducers being 
slow-started in order of their destination id - reducer 0 started before 
reducer 1 etc...The trouble happens when the skew hits reducer 493 instead of 
affecting any of the earlier ones on a cluster with 199 containers.

!tez-reducer-skew.png!

The ideal slow-start order would be to start the reducer which has a known 
largest output size as each mapper time between min-src and max-src time frames.

Storing this information per-mapper might be incredibly expensive, however it 
might be possible to collect & aggregate leaky bit-sets of information in the 
vertex manager which can sample the quality of information lightly instead of 
storing all the incoming information.

Today the only information collected is an isEmpty bitset produced along with 
the DataMovementEvent, while this basic information is not read in any of the 
scheduling layers.

Along with making that information accessible (in aggregate form) the 
information about that reducers' output size can be sent as a 3-bit 
log-base-1024 bitset instead of 1 bit each, which can be OR'd out at the vertex 
manager to determine which destination to slow-start - in particularly for the 
example I have listed since all reducers have input.

The vertex manager can sample that stream instead of mandatorily reading 
through all the messages, which is where this takes a departure from the 
standard counter aggregation.

The quality of information can be preserved in such a model to schedule 
reducers more intelligently with just a minor increase of metrics traffic like 
that in the data movement messages.

Right now, the hacky work around such skew mistakes in scheduling by lowering 
the total reducer count till all of them can slow-start (i.e lower 1099 -> 199) 
- which saves 70+ secs off the query.

That is a specific case where there is an explicit decision made incorrectly by 
the scheduler due to the lack of data-quality metrics based decision making.

> Statistics infrastructure and API for Tez
> -----------------------------------------
>
>                 Key: TEZ-1167
>                 URL: https://issues.apache.org/jira/browse/TEZ-1167
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: tez-reducer-skew.png
>
>
> Statistics on data (data size, distribution etc) and performance (compute 
> time/CPU time etc) are important in making runtime decisions on plan 
> reconfiguration. For that it would be useful to have a uniform infrastructure 
> that provides this information and make it available via an API so that user 
> code in vertex managers etc can make use of this data to make decisions. This 
> umbrella jira tracks improvements needed in Tez to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1167) Statistics infrastructure and API for Tez

Reply via email to