[
https://issues.apache.org/jira/browse/TEZ-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Bathori reassigned TEZ-4392:
---------------------------------
Assignee: Mark Bathori
> Streamed event serialization and distribution
> ---------------------------------------------
>
> Key: TEZ-4392
> URL: https://issues.apache.org/jira/browse/TEZ-4392
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Ádám Szita
> Assignee: Mark Bathori
> Priority: Major
>
> Tez currently compiles the full list of events for a given job, then
> serializes every event into another list before starting to distribute the
> events to executor instances.
> This way all the events are held in memory which in some cases may take up
> much space (e.g. 1 MB split size X thousands of split count). It would be
> more memory efficient to do this in a streamed way, that is, serialize an
> event right before sending it out to an executor, not before.
> Currently InputInitializer has the following methods that are of interest for
> this:
> {code:java}
> public abstract List<Event> initialize() throws Exception;
> public abstract void handleInputInitializerEvent(List<InputInitializerEvent>
> var1) throws Exception;{code}
> could these be changed to return/take an Iterator of
> Event/InputInitializerEvent ?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)