[jira] [Created] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

Mathieu Bouchard (JIRA) Mon, 25 Aug 2014 06:22:14 -0700

Mathieu Bouchard created NUTCH-1828:
---------------------------------------


             Summary: bin/crawl : incorrect handling of nutch errors
                 Key: NUTCH-1828
                 URL: https://issues.apache.org/jira/browse/NUTCH-1828
             Project: Nutch
          Issue Type: Bug
          Components: nutchNewbie
    Affects Versions: 2.2.1, 1.9
         Environment: Ubuntu Server 14.04, OpenJDK 7
            Reporter: Mathieu Bouchard


We are using Solr with Nutch to provide a complete search engine for our 
website.

I created a cron job that would use Nutch to crawl and update the Solr index 
each night. This cron job is trying to automatically correct some errors that 
could result in a corrupt crawldb. However, it seems that the bin/crawl command 
doesn't correctly propagate errors coming from bin/nutch.

Here is an exemple from the bin/crawl script :
    $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR

    if [ $? -ne 0 ]
      then exit $?
    fi

Even if there is an error in the nutch inject command, the crawl script always 
returns 0. The way I understand it, the exit code returned is the result of the 
shell test and not the result of the nutch inject command.

To correct this, we would need to modify the script with something like :
    $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
    RETCODE=$?

    if [ $RETCODE -ne 0 ]
      then exit $RETCODE
    fi




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

Reply via email to