[
https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879220#comment-13879220
]
Chen He commented on MAPREDUCE-5670:
------------------------------------
Hi [~jlowe]
I followed your suggestion and make corresponding changes in the v2 patch:
{quote} org.apache.hadoop.mapred.lib.db.TestCombineFileRecordReader is in
the wrong package, should be
org.apache.hadoop.mapred.lib.TestCombineFileRecordReader {quote}
TestCombineFileRecordReader has been moved under
org.apache.hadoop.mapred.lib;
{quote}There's an unused Mockito import in one of the tests, and it seems
like Mockito could save quite a bit of test code. A mocked reporter with
verify(mockReporter).progress() to verify the progress method was called would
accomplish the same with much less code. Did that not work for some reason? The
unused import implies that approach was probably attempted, so curious if
that's the case why it didn't work out. {quote}
Mockito is used and previous Reporter4Test class has been removed.
{quote} The original bug involves a bunch of empty files, so the test
doesn't need to write any data to the test files and all the lengths should be
zero. Bonus points if the test calls next() on the record reader, verifies it
returns false and verifies progress was called the correct number of times.
{quote}
The v2 patch will verify whether reporter.progress() is called right after
the instantiation of CombineFileRecordReader. I also created 3 empty files in
test directory to verify whether reporter.progress() is called exactly 3 times
after processing these 3 empty files. These 3 files will be recycled after test
is done.
{quote} There's an unnecessary try...finally block in
testInitNextRecordReader {quote}
"try...finally" block is used for recycling files created during the test
process.
> CombineFileRecordReader should report progress when moving to the next file
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-5670
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.9
> Reporter: Jason Lowe
> Assignee: Chen He
> Priority: Minor
> Fix For: 0.23.10
>
> Attachments: MR-5670.patch, MR-5670v2.patch
>
>
> If a combine split consists of many "empty" files (i.e.: no record found by
> the underlying record reader) then theoretically a task can timeout due to
> lack of reported progress.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)