This is a new feature in the 0.7 version. Previously, the url listing was a file, but it's now a directory. It's most probably documented in the release notes, but the change hasn't followed through to the tutorials just yet. If you check the mailing list archive, there are a couple of threads on this topic.
Fredrik On 9/7/05, Earl Cahill <[EMAIL PROTECTED]> wrote: > > Though, my last email was more about documenting the > whole setup process, it looks like the error I > mentioned was fixed by creating a directory and > putting a urls file in that directory. It also looks > like the name of the file doesn't matter. So I made a > myurls directory, put a urls file in there and then > ran > > bin/nutch crawl myurls -dir crawl.test -depth 3 > > But, yeah, would like to put such steps in a tutorial. > > > It looks like the front page got hit, and that's about > it, so there is more to do. > > Earl > > --- Earl Cahill <[EMAIL PROTECTED]> wrote: > > > howdy, > > > > I have been looking around for a nutch/mapred > > tutorial > > and haven't had much luck. I found this one > > > > http://lucene.apache.org/nutch/tutorial.html > > > > which did help me get a crawl going on trunk, but no > > such luck in branches/mapred. I set the urls file > > and > > the filter in the same way that I did for trunk and > > I > > get > > > > 050907 013817 parsing > > > file:/home/nutch/nutch/branches/mapred/conf/nutch-site.xml > > java.io.IOException: No input files in: > > [Ljava.io.File;@32b0bad7 > > at > > > org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:74) > > at > > > org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:84) > > at > > > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:59) > > > > Guess I am wondering if a detailed tutorial for > > mapred > > exists. Seems like doug was saying that it didn't. > > I > > would be up for walking through getting a crawl > > going > > and documenting my steps, but won't dive in if one > > already exists. Also wondering if I would/could put > > my doc on the wiki. > > > > Thanks, > > Earl > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >
