[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338043#comment-14338043
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

bq. Are composite events being exploded into dm events for every task? If yes, 
the CPU is not different.
Are the exploded dm events being stored in TaskImpl so that they can be sent to 
tasks when they fetch them? If yes, then it does not solve the memory problem. 
If not, then its on demand routing per task.
Please refer to 
https://issues.apache.org/jira/browse/TEZ-776?focusedCommentId=14326315&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14326315,
 
https://issues.apache.org/jira/browse/TEZ-776?focusedCommentId=14316061&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14316061
 and 
https://issues.apache.org/jira/browse/TEZ-776?focusedCommentId=14000047&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14000047)
Just went though these for option #4, not sure why there's confusion about what 
is being suggested. There's no mention of Task explosion for storage - and they 
talk about how memory efficiency and CPU efficiency are attained.
If there's still confusion - I could just write a mini patch to illustrate the 
approach.

bq. InputFailedEvents have well known framework fields and an opaque user 
binary payload. If the well known fields are being used then I am not sure how 
Tez code can be worse off in matching up and invalidating events compared to 
plugins. If the plugin is going to deserialize the user payload then its 
overhead. In the same vein, there isnt any restriction on edge plugins to make 
routing optimizations in either case, without burdening them with the event 
storage and handling.
InputFailedEvents are generated as a result of InputReadErrorEvents. They don't 
have a binary payload, and are only generated after routing. Without a routing 
table being stored somewhere - I don't see how Tez will be able to invalidate 
past events based on these.
InputReadErrorEvents need to be routed by Edges to determine the source - again 
I don't think Tez can do much with these without assistance from the Edge 
routing table.
A plugin, on the other hand, will have information to invalidate a stored 
DataMovementEvent as a result of an InputReadErrorEvent / routed 
InputFailedEvent.

bq. I will try to add some concurrency to the simulation and compare.
Measuring the load on the system will be useful as well. IIRC, 
[~rajesh.balamohan] saw the load go up to 500% when counter updates were sent 
from tasks too often - as a result of 30 threads deserializing counters. With 
30 threads doing a large number of invocations - I'd expect to go very high 
here as well. 

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.patch, 
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to