sanha commented on issue #2: [NEMO-7] Intra-TaskGroup pipelining URL: https://github.com/apache/incubator-nemo/pull/2#issuecomment-371376883 For the @johnyangk's comment (avoiding hashing and creating objects), I'd suggest the following model. - Build a DAG of `TaskWrapper` (or something like that) from `taskGroupDag` of `ScheduledTaskGroup` when a `TaskGroup` is scheduled. - This DAG should manage the connection among vertices as pointer (reference) rather than `Map` and `List`, unlike our current `DAG` implementation. - The `TaskWrapper` should have `Callable`, which consumes input element and produce output. - This `Callable` can be built from the `Transform` of `Task` that the wrapper wraps. - The `TaskWrapper` can have any other stuffs which are stored in `TaskDataHandler` now. - After this, each data in input data `Iterable` can be processed through this DAG of `TaskWrapper` without calculating any hash or creating any extra object.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
