johnyangk commented on a change in pull request #2: [NEMO-7] Intra-TaskGroup pipelining URL: https://github.com/apache/incubator-nemo/pull/2#discussion_r172709098
########## File path: runtime/executor/src/main/java/edu/snu/nemo/runtime/executor/TaskGroupExecutor.java ########## @@ -60,13 +55,36 @@ private final DataTransferFactory channelFactory; private final MetricCollector metricCollector; - /** - * Map of task IDs in this task group to their readers/writers. - */ - private final Map<String, List<InputReader>> physicalTaskIdToInputReaderMap; - private final Map<String, List<OutputWriter>> physicalTaskIdToOutputWriterMap; - - private boolean isExecutionRequested; + // Map of task ID to its intra-TaskGroup data pipe. + private final Map<Task, List<LocalPipe>> taskToInputPipesMap; + private final Map<Task, LocalPipe> taskToOutputPipeMap; // one and only one Pipe per task + // Readers/writers that deals with inter-TaskGroup data. Review comment: I'm not sure about the newly added data structures below for the following reasons - Some of them cost an extra per-element operation and memory overhead - Maybe we can do without some of them Alternatively, I'd think about the following options - Compose iterators like Spark (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala#L38) - Reuse `taskGroupDag` to obtain dependency information ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services