Siddharth Seth commented on TEZ-3269:

Apologies for the long delay in the review.
Mostly looks good. Would be a lot easier to review if this were split into 
smaller jiras... think it combines a bunch of things like long to int, with the 
core logic changes.
Minor Stuff:
- final where possible - e.g. PartitionsGroupingCalculator.sourceVertexInfo, 
all variables in FairEdgeConfiguration
- This is a fairly complicated patch. Would be good to have some more 
   - the ceil method
   - Within various conditions in compute and iterator
- Obligatory rename request: getTotalStatsAtIndex to 
getCurrentlyKnownStatsAtIndex - this method will normally not return totalStats.
- Nit: expectedTotalSourceTasksOutputSize / numOfPartitions; - can be done once 
outside the loop
- onVertexStarted - Should this be split up a little more. It's possible for 
quite a bit to happen at the moment, before the "single vertex only" check is 
hit in FairShufflleVertexManager

- estimatePartitionSize.partitionstatSizeInMB is across all partitions. This 
ensures that averaging of stats based on output size isn't accidentally hit on 
a 0 sized partition? (Could break earlier from the loop)
- In case of reduce_parallelism - this considers the partition size and may 
produce groups with different number of partitions to consume, which the 
current ShuffleVertexManager doesn't do yet?
- Will the parallelism ever end up getting increased?

Any thoughts on what it will take to move this to support multiple source 
vertices ?

> Provide basic fair routing and scheduling functionality via custom 
> VertexManager and EdgeManager
> ------------------------------------------------------------------------------------------------
>                 Key: TEZ-3269
>                 URL: https://issues.apache.org/jira/browse/TEZ-3269
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>         Attachments: TEZ-3269-2.patch, TEZ-3269-3.patch, TEZ-3269.patch
> With TEZ-3206 and TEZ-3216, we can build a custom VertexManager and 
> EdgeManager that uses partition stats to do fair routing as well as the 
> scheduling based on destination tasks’ dependency on source tasks.

This message was sent by Atlassian JIRA

Reply via email to