jeongyooneo opened a new pull request #115: [NEMO-96] Modularize DataSkewPolicy 
to use MetricVertex and BarrierVertex
URL: https://github.com/apache/incubator-nemo/pull/115
 
 
   JIRA: [NEMO-96: Modularize DataSkewPolicy to use MetricVertex and 
BarrierVertex](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-96)
   
   **Major changes:**
   - Handle dynamic optimization via `MetricCollectionVertex` and 
`AggregationBarrierVertex` instead of `MetricCollectionBarrierVertex`
   - For each shuffle edge with main output, `MetricCollectionVertex` is 
inserted in compile-time at the end of its source tasks, which collects key 
frequency data
   - For each shuffle edge with main output, `AggregationBarrierVertex` is 
inserted in compile-time. It aggregates task-level key frequency data, which is 
collected via each `MetricCollectionVertex` and emitted as additional tagged 
output 
   
   **Minor changes to note:**
   - Added encoder/decoder factories needed for aggregating dynamic 
optimization data - in here key frequency data
   - Modified `PipelineTranslator` to extract key encoder/decoders
   - Modified `DataSkewRuntimePass` and related code path to handle `Object` 
type keys, instead of integer type hash index keys
   
   **Tests for the changes:**
   - N/A(unit tests for skew handling and `PerKeyMedianITCase` test the changes)
   
   **Other comments:**
   - N/A
   
   Closes #GITHUB_PR_NUMBER
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to