[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337351#comment-14337351
 ] 

Bikas Saha commented on TEZ-776:
--------------------------------

bq. I'm not sure where TaskImpl comes in with Option 4. The patch does add APIs 
to the edge; the main difference from Option 4 is letting the EdgePlugin handle 
storage, and optimize CPU
Are composite events being exploded into dm events for every task? If yes, the 
CPU is not different.
Are the exploded dm events being stored in TaskImpl so that they can be sent to 
tasks when they fetch them? If yes, then it does not solve the memory problem. 
If not, then its on demand routing per task.

bq. InputFailedEvents need to be matched against the generated events. I'm not 
sure the framework is setup very well to do this - not without additional 
routing and lookups. Edges can consolidate these in a smart manner, with far 
more information available. Also efficiently.
InputFailedEvents have well known framework fields and an opaque user binary 
payload. If the well known fields are being used then I am not sure how Tez 
code can be worse off in matching up and invalidating events compared to 
plugins. If the plugin is going to deserialize the user payload then its 
overhead.

I will try to add some concurrency to the simulation and compare.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.patch, 
> events-problem-solutions.txt
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to