[ 
https://issues.apache.org/jira/browse/TEZ-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116430#comment-17116430
 ] 

Ashutosh Chauhan commented on TEZ-4069:
---------------------------------------

+1

> Avoid repeated computation of preferred locations in split grouping.
> --------------------------------------------------------------------
>
>                 Key: TEZ-4069
>                 URL: https://issues.apache.org/jira/browse/TEZ-4069
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.9.2
>            Reporter: Oliver Draese
>            Priority: Major
>         Attachments: TEZ-4069.1.patch, TEZ-4069.patch
>
>
> The TezSplitGrouper iterates through the list of splits multiple times, when 
> trying to group the splits (see getGroupedSplits). Each time, it asks the 
> locationProvider to return the array of preferred locations for the splits. 
> This has two side effects:
>  * generating the list of preferred locations can cause some CPU overhead 
> (i.e. calculating the consistent hash in HostAffinitySplitLocationProvider), 
> which can be avoided
>  * if the list of preferred location is changing between the different loops 
> of getGroupedSplits, we might encounter a NullPointerException. This happens 
> if a new location appears, that was not part of the initial set of locations 
> when populating the distinctLocations map.
> The getGroupedSplits should query the preferred locations only once (for each 
> split) via the location provider and then memorize these instead of asking 
> the location provider repeatedly.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to