Chris A. Mattmann created NUTCH-1905:
----------------------------------------

             Summary: Nutch index tool should be resilient to segments that 
don't have crawl_* data
                 Key: NUTCH-1905
                 URL: https://issues.apache.org/jira/browse/NUTCH-1905
             Project: Nutch
          Issue Type: Bug
          Components: indexer
            Reporter: Chris A. Mattmann
            Assignee: Chris A. Mattmann
             Fix For: 1.10


When running the ./bin/nutch index command with the -dir <path/to/segment/dir> 
I noticed that if you have a segment directory that doesn't include crawl_* or 
parse_* data, that the indexer fails (correctly). However, the indexer should 
be more resilient in those cases - we can add a simple check to see if those 
dirs are present in the segment, and proceed if they are, otherwise, ignore 
that segment and print a message (and go to the other segments).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to