I've freshly installed a nutch nightly build onto my laptop using an up-to-date 
cygwin.  Basically I just downloaded the .tar.gz, ran ant, and verified that 
$NUTCH_HOME/bin/nutch works (gives me the help screen).  I set up 
nutch-site.xml, urls.txt and attempted to crawl.  However, I get an exception 
in org.apache.hadoop.mapred.InvalidInputException.  The hadoop.log doesn't 
report the error, just the command line crawl command.  Anyone seen this before?


$ nutch crawl /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt -dir 
/cygdrive/c/nutch-2007-07-26_04-01-20/content /sf911truth -depth 3 -topN 200
crawl started in: /cygdrive/c/nutch-2007-07-26_04-01-20/content/sf911truth
rootUrlDir = /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
threads = 10
depth = 3
topN = 200
Injector: starting
Injector: crawlDb: 
/cygdrive/c/nutch-2007-07-26_04-01-20/content/sf911truth/crawldb
Injector: urlDir: /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
Input path doesnt exist : /cygdrive/c/nutch-2007-07-26_04-01-20/content/urls.txt
        at 
org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)





       
____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to