[
https://issues.apache.org/jira/browse/TEZ-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419567#comment-15419567
]
Zhiyuan Yang edited comment on TEZ-3230 at 8/31/16 11:20 PM:
-------------------------------------------------------------
bq. CartesianProductEdgeManagerUnpartitioned#getNumDestinationConsumerTasks
doesn't depend on sourceTaskIndex. So it could cache the value in
initialization. Granted this isn' important given it is called only in the case
of INPUT_READ_ERROR_EVENT.
Thanks for pointing this out! Already added into new patch.
bq.
CartesianProductEdgeManagerPartitioned#routeCompositeDataMovementEventToDestination
optimization. Instead of computing the partition from taskTaskId, we can store
the destinationTaskIndex -> partition mapping. Then taskIdMapping becomes
unnecessary.
This is a trade off between CPU and memory. IMO, memory is rarer resource than
CPU. Given the profiling didn’t show significant CPU overhead, I’ll keep
current implementation.
was (Author: aplusplus):
bq. CartesianProductEdgeManagerUnpartitioned#getNumDestinationConsumerTasks
doesn't depend on sourceTaskIndex. So it could cache the value in
initialization. Granted this isn' important given it is called only in the case
of INPUT_READ_ERROR_EVENT.
Thanks for pointing this out! Already added into new patch.
bq.
CartesianProductEdgeManagerPartitioned#routeCompositeDataMovementEventToDestination
optimization. Instead of computing the partition from taskTaskId, we can store
the destinationTaskIndex -> partition mapping. Then taskIdMapping becomes
unnecessary.
This is a trade off between CPU and memory. IMO, memory is rarer resource than
CPU. Given the profiling didn’t show significant CPU overhead, I’ll keep
current implementation.
bq.
CartesianProductEdgeManagerPartitioned#routeInputSourceTaskFailedEventToDestination
computes the partition and use it to create EventRouteMetadata. It appears it
isn't necessary to specify the sourceTaskOutputIndex; Edge doesn't use that.
I would say let’s stick to what’s specified in API. Although this can improve
the performance, it’s derived from system implementation which changes from
time to time, so it’s not a good idea to depend on this.
> Implement vertex manager and edge manager of cartesian product edge
> -------------------------------------------------------------------
>
> Key: TEZ-3230
> URL: https://issues.apache.org/jira/browse/TEZ-3230
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Zhiyuan Yang
> Assignee: Zhiyuan Yang
> Attachments: TEZ-3230.1.patch, TEZ-3230.2.patch, TEZ-3230.3.patch,
> TEZ-3230.4.patch, TEZ-3230.5.patch, TEZ-3230.6.patch, TEZ-3230.7.patch,
> TEZ-3230.8.patch, TEZ-3230.9.patch, TEZ-3230.WIP.1.patch,
> TEZ-3230.WIP.2.patch, TEZ-3230.WIP.3.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)