[ https://issues.apache.org/jira/browse/NUTCH-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861407#action_12861407 ]
Dennis Kubes commented on NUTCH-814: ------------------------------------ This should be attributed to Rob Bradshaw (r...@baynote.com) upon commit, he found the issue. Dennis > SegmentMerger bug > ----------------- > > Key: NUTCH-814 > URL: https://issues.apache.org/jira/browse/NUTCH-814 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.1 > Reporter: Dennis Kubes > Assignee: Andrzej Bialecki > Fix For: 1.1 > > Attachments: merger.patch > > > Dennis reported: > {quote} > In the SegmentMerger.java file about line 150 we have this: > final SequenceFile.Reader reader = > new SequenceFile.Reader(FileSystem.get(job), fSplit.getPath(), > job); > Then about line 166 in the record reader we have this: > boolean res = reader.next(key, w); > If I am reading that right, that would mean that the map tap would loop > over all records for a given file and not just a given split. > {quote} > Right, this should instead use SequenceFileRecordReader that already has the > logic to handle splits. Patch coming shortly - thanks for spotting this! This > could be the reason for "out of disk space" errors that many users reported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.