[ 
https://issues.apache.org/jira/browse/TEZ-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bathori reassigned TEZ-4392:
---------------------------------

    Assignee: Mark Bathori

> Streamed event serialization and distribution
> ---------------------------------------------
>
>                 Key: TEZ-4392
>                 URL: https://issues.apache.org/jira/browse/TEZ-4392
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ádám Szita
>            Assignee: Mark Bathori
>            Priority: Major
>
> Tez currently compiles the full list of events for a given job, then 
> serializes every event into another list before starting to distribute the 
> events to executor instances.
> This way all the events are held in memory which in some cases may take up 
> much space (e.g. 1 MB split size X thousands of split count). It would be 
> more memory efficient to do this in a streamed way, that is, serialize an 
> event right before sending it out to an executor, not before.
> Currently InputInitializer has the following methods that are of interest for 
> this:
> {code:java}
> public abstract List<Event> initialize() throws Exception;
> public abstract void handleInputInitializerEvent(List<InputInitializerEvent> 
> var1) throws Exception;{code}
> could these be changed to return/take an Iterator of 
> Event/InputInitializerEvent ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to