[ https://issues.apache.org/jira/browse/NUTCH-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556093#comment-13556093 ]
Sebastian Nagel commented on NUTCH-1520: ---------------------------------------- Hi Markus, have a look at NUTCH-1113. An alternative solution is to take in certain cases more than one CrawlDatum into the merged segment. > SegmentMerger looses records > ---------------------------- > > Key: NUTCH-1520 > URL: https://issues.apache.org/jira/browse/NUTCH-1520 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.6 > Reporter: Markus Jelsma > Priority: Critical > Fix For: 1.7 > > Attachments: NUTCH-1520-1.7-1.patch > > > It seems the SegmentMerger tool looses documents. You're likely to see less > documents in an index if you index one or more already merged segments than > if you index all unmerged segments. > This is really nasty! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira