Ádám Szita created TEZ-4392:
-------------------------------

             Summary: Streamed event serialization and distribution
                 Key: TEZ-4392
                 URL: https://issues.apache.org/jira/browse/TEZ-4392
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Ádám Szita


Tez currently compiles the full list of events for a given job, then serializes 
every event into another list before starting to distribute the events to 
executor instances.

This way all the events are held in memory which in some cases may take up much 
space (e.g. 1 MB split size X thousands of split count). It would be more 
memory efficient to do this in a streamed way, that is, serialize an event 
right before sending it out to an executor, not before.

Currently InputInitializer has the following methods that are of interest for 
this:
{code:java}
public abstract List<Event> initialize() throws Exception;

public abstract void handleInputInitializerEvent(List<InputInitializerEvent> 
var1) throws Exception;{code}
could these be changed to return/take an Iterator of 
Event/InputInitializerEvent ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to