Bikas came to the Pig Sprint Planning and we had some good discussions on the approaches we are taking.
Prioritized set of requirements from the Pig Team: Q1: - Partitioned unsorted output - API to start Input fetch (TEZ-668) - Fix for any issues with 1-1 Edge found by us as we plan to use them for one stage in order-by. - UI for Tez jobs Q2: - Advanced Memory management - Memory manager for inputs based on map output sizes. - Memory manager for outputs As a workaround till then we will set io.sort.mb on the input and output descriptors ourselves manually based on number of edges. Q3/Q4: - Partial aggregator to determine number of reduces - Split edge support - same output to multiple vertices Hitesh, Bikas said you would have information on the UI for Tez as you are driving it with YARN team. Can you brief us on what we can expect and possibly when? Regards, Rohini