johnyangk opened a new pull request #129: [NEMO-8] Implement 
PipeManagerMaster/Worker
URL: https://github.com/apache/incubator-nemo/pull/129
 
 
   JIRA: [NEMO-8: Implement 
PipeManagerMaster/Worker](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-8)
   
   **Major changes:**
   - Supports fully-pipelined data streaming for bounded sources (not unbounded 
sources)
     - Tasks do 'finish' after processing all input data, as the data is finite
     - When a tasks finishes, it emits all data it has (e.g., GroupByKey 
accumulated results) and closes corresponding outgoing pipes, notifying 
downstream tasks the end of the pipes
     - For stream-processing unbounded sources, we need watermarks 
(https://issues.apache.org/jira/browse/NEMO-233)
   - Introduces PipeManagerMaster/Worker
     - Shares code with BlockManagerMaster/Worker 
   - Naive, Element-wise serialization+compression+writeAndFlush
     - Very likely that this will cause some serious overheads, but fixing it 
is a different issue
   
   **Minor changes to note:**
   - JobConf#SchedulerImplClassName: Batch and Streaming options
   - StreamingPolicyParallelismFive: The default policy + 
PipeTransferEverythingPass
   - Fixes the StreamingScheduler to pass the new streaming integration tests
   - Fixes a coder bug in the Beam frontend (PCollectionView coder)
   
   **Tests for the changes:**
   - WindowedWordCountITCase#testStreamingFixedWindow
   - WindowedWordCountITCase#testStreamingSlidingWindow
   
   **Other comments:**
   - Also closes "Implement common API for data transfer" 
(https://issues.apache.org/jira/browse/NEMO-9)
   
   Closes #GITHUB_PR_NUMBER
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to