[ 
https://issues.apache.org/jira/browse/TEZ-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105118#comment-14105118
 ] 

Siddharth Seth commented on TEZ-1446:
-------------------------------------

Comments on the patch, mostly minor
- In impl/Fetcher.java (SortedMerged case), the local fetch breaks after a 
single fetch failure. However, in the broadcast fetcher, it attempts to fetch 
everything - and just reports the failure. Is there a reason for the 
SortedMerged Fetcher not to do the same ?
- In impl/Fetcher.java - putBackRemainingMapOutputs should be in a finally 
block, in case of other non-IO exceptions.
- Rename TestFetcherImpl to TestFetcher (differentiated by package)
- Rename testFetchLocalFetchModeEnabled to something like 
testFetchLocalFetchModeEnabled - since it's testing both options.
- testSetupLocalDiskFetch1 (and it's counterpart in TestFetcherImpl) - it'll be 
useful to have a failure in the middle of the list as well, to validate that 
others go through (or if the MergedFetcher does not change, the correct set is 
reported as failed)
- Instead of reading TezRuntimeConfiguration.TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH 
from configuration in each fetcher / Callable - this should be read once in the 
ShuffleManager/Scheduler and passed in as a parameter. Reading from 
Configuration can be a little expensive - and we've spent some time in the past 
to minimize it.
- MapOutput - I believe the 3 constructors are used for the 3 different cases 
(MEM, DISK, DISK_DIRECT). Some javadoc on the individual constructors will be 
useful. Even better would be named static methods for creation.
- compareTo in FileChunk; I believe this is only required because the 
MergeManager ends up sorting the files (like it does for InMemorySegments). 
InMemSegments are sorted on size. Should size be compared before offset. This 
won't matter much though, since the path is compared first. Maybe we can change 
this in a subsequent jira, and remove the invocation which causes the merger to 
sort the segments, or store File based sources in a simple Set.

There's a TODO mentioning check for shutdown - this needs to be done for the 
HTTPFetch as well - Could you please file a jira for this, and reference it in 
the code.

> Move the fetch code for local disk fetch from data movement event handlers to 
> fetcher
> -------------------------------------------------------------------------------------
>
>                 Key: TEZ-1446
>                 URL: https://issues.apache.org/jira/browse/TEZ-1446
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Prakash Ramachandran
>            Assignee: Prakash Ramachandran
>         Attachments: TEZ-1446.1.patch, TEZ-1446.2.patch
>
>
> doing the fetch in the EventHandler has the drawback of tying up the 
> EventHandling thread, in case it takes time to read the Index file. Even, for 
> a healthy disk - we can do without any delays on this thread, so that the 
> fetcher have some work to do.  
> Also Need to add Unit test cases for the local fetch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to