Chris A. Mattmann created NUTCH-1905:
----------------------------------------
Summary: Nutch index tool should be resilient to segments that
don't have crawl_* data
Key: NUTCH-1905
URL: https://issues.apache.org/jira/browse/NUTCH-1905
Project: Nutch
Issue Type: Bug
Components: indexer
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Fix For: 1.10
When running the ./bin/nutch index command with the -dir <path/to/segment/dir>
I noticed that if you have a segment directory that doesn't include crawl_* or
parse_* data, that the indexer fails (correctly). However, the indexer should
be more resilient in those cases - we can add a simple check to see if those
dirs are present in the segment, and proceed if they are, otherwise, ignore
that segment and print a message (and go to the other segments).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)