Hi nutch-dev,
I was assuming that the commands to run the bin/crawl script in both local
and deploy mode are the same.
ie. from $NUTCH_HOME/runtime/local (or runtime/deploy), use
> bin/crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>
It turns out that in deploy mode, this does not obtain the segment location
from HDFS and runs into problems. The reason being this code snippet in the
crawl script: it tries to locate the job file in the parent directory and
fails (note that I am running from runtime/deploy):
mode=local
if [ -f ../*nutch-*.job ]; then
mode=distributed
fi
When ran from runtime/deploy/bin, it runs properly.
Shouldn't the command be consistent with that of local mode ?
Thanks,
Tejas