[
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338043#comment-14338043
]
Siddharth Seth commented on TEZ-776:
------------------------------------
bq. Are composite events being exploded into dm events for every task? If yes,
the CPU is not different.
Are the exploded dm events being stored in TaskImpl so that they can be sent to
tasks when they fetch them? If yes, then it does not solve the memory problem.
If not, then its on demand routing per task.
Please refer to
https://issues.apache.org/jira/browse/TEZ-776?focusedCommentId=14326315&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14326315,
https://issues.apache.org/jira/browse/TEZ-776?focusedCommentId=14316061&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14316061
and
https://issues.apache.org/jira/browse/TEZ-776?focusedCommentId=14000047&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14000047)
Just went though these for option #4, not sure why there's confusion about what
is being suggested. There's no mention of Task explosion for storage - and they
talk about how memory efficiency and CPU efficiency are attained.
If there's still confusion - I could just write a mini patch to illustrate the
approach.
bq. InputFailedEvents have well known framework fields and an opaque user
binary payload. If the well known fields are being used then I am not sure how
Tez code can be worse off in matching up and invalidating events compared to
plugins. If the plugin is going to deserialize the user payload then its
overhead. In the same vein, there isnt any restriction on edge plugins to make
routing optimizations in either case, without burdening them with the event
storage and handling.
InputFailedEvents are generated as a result of InputReadErrorEvents. They don't
have a binary payload, and are only generated after routing. Without a routing
table being stored somewhere - I don't see how Tez will be able to invalidate
past events based on these.
InputReadErrorEvents need to be routed by Edges to determine the source - again
I don't think Tez can do much with these without assistance from the Edge
routing table.
A plugin, on the other hand, will have information to invalidate a stored
DataMovementEvent as a result of an InputReadErrorEvent / routed
InputFailedEvent.
bq. I will try to add some concurrency to the simulation and compare.
Measuring the load on the system will be useful as well. IIRC,
[~rajesh.balamohan] saw the load go up to 500% when counter updates were sent
from tasks too often - as a result of 30 threads deserializing counters. With
30 threads doing a large number of invocations - I'd expect to go very high
here as well.
> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.patch,
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks
> that can be processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)