[
https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877847#comment-13877847
]
Jason Lowe commented on MAPREDUCE-5670:
---------------------------------------
Thanks for picking this up, Chen! Comments on the patch:
* org.apache.hadoop.mapred.lib.db.TestCombineFileRecordReader is in the wrong
package, should be org.apache.hadoop.mapred.lib.TestCombineFileRecordReader
* There's an unused Mockito import in one of the tests, and it seems like
Mockito could save quite a bit of test code. A mocked reporter with
verify(mockReporter).progress() to verify the progress method was called would
accomplish the same with much less code. Did that not work for some reason?
The unused import implies that approach was probably attempted, so curious if
that's the case why it didn't work out.
* The original bug involves a bunch of empty files, so the test doesn't need to
write any data to the test files and all the lengths should be zero. Bonus
points if the test calls next() on the record reader, verifies it returns false
and verifies progress was called the correct number of times.
* There's an unnecessary try...finally block in testInitNextRecordReader
> CombineFileRecordReader should report progress when moving to the next file
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-5670
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.9
> Reporter: Jason Lowe
> Assignee: Chen He
> Priority: Minor
> Attachments: MR-5670.patch
>
>
> If a combine split consists of many "empty" files (i.e.: no record found by
> the underlying record reader) then theoretically a task can timeout due to
> lack of reported progress.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)