[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879220#comment-13879220
 ] 

Chen He commented on MAPREDUCE-5670:
------------------------------------

Hi [~jlowe]
  I followed your suggestion and make corresponding changes in the v2 patch:
     
   {quote} org.apache.hadoop.mapred.lib.db.TestCombineFileRecordReader is in 
the wrong package, should be 
org.apache.hadoop.mapred.lib.TestCombineFileRecordReader {quote}
    TestCombineFileRecordReader has been moved under 
org.apache.hadoop.mapred.lib;
    {quote}There's an unused Mockito import in one of the tests, and it seems 
like Mockito could save quite a bit of test code. A mocked reporter with 
verify(mockReporter).progress() to verify the progress method was called would 
accomplish the same with much less code. Did that not work for some reason? The 
unused import implies that approach was probably attempted, so curious if 
that's the case why it didn't work out. {quote}
    Mockito is used and previous Reporter4Test class has been removed.
    {quote} The original bug involves a bunch of empty files, so the test 
doesn't need to write any data to the test files and all the lengths should be 
zero. Bonus points if the test calls next() on the record reader, verifies it 
returns false and verifies progress was called the correct number of times.
{quote}
     The v2 patch will verify whether reporter.progress() is called right after 
the instantiation of CombineFileRecordReader. I also created 3 empty files in 
test directory to verify whether reporter.progress() is called exactly 3 times 
after processing these 3 empty files. These 3 files will be recycled after test 
is done.   
   {quote} There's an unnecessary try...finally block in 
testInitNextRecordReader {quote}
     "try...finally" block is used for recycling files created during the test 
process.


> CombineFileRecordReader should report progress when moving to the next file
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5670
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.9
>            Reporter: Jason Lowe
>            Assignee: Chen He
>            Priority: Minor
>             Fix For: 0.23.10
>
>         Attachments: MR-5670.patch, MR-5670v2.patch
>
>
> If a combine split consists of many "empty" files (i.e.: no record found by 
> the underlying record reader) then theoretically a task can timeout due to 
> lack of reported progress.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to