JIRA: [NEMO-96: Modularize DataSkewPolicy to use MetricVertex and 
BarrierVertex](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-96)

**Major changes:**
- Handle dynamic optimization via `MetricCollectionVertex` and 
`AggregationBarrierVertex` instead of `MetricCollectionBarrierVertex`
- For each shuffle edge with main output, `MetricCollectionVertex` is inserted 
in compile-time at the end of its source tasks, which collects key frequency 
data
- For each shuffle edge with main output, `AggregationBarrierVertex` is 
inserted in compile-time. It aggregates task-level key frequency data, which is 
collected via each `MetricCollectionVertex` and emitted as additional tagged 
output 

**Minor changes to note:**
- Added encoder/decoder factories needed for aggregating dynamic optimization 
data - in here key frequency data
- Modified `PipelineTranslator` to extract key encoder/decoders
- Modified `DataSkewRuntimePass` and related code path to handle `Object` type 
keys, instead of integer type hash index keys

**Tests for the changes:**
- N/A(unit tests for skew handling and `PerKeyMedianITCase` test the changes)

**Other comments:**
- N/A

Closes #GITHUB_PR_NUMBER


[ Full content available at: https://github.com/apache/incubator-nemo/pull/115 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to