Yes, the Agent Name was empty. It works now. Thanks Much.
Nutch Newbie wrote: > > On Wed, Jan 20, 2010 at 7:10 PM, kraman <kirthi.ra...@gmail.com> wrote: >> >> kirth...@cerebrum [~/www/nutch]# ./bin/nutch crawl url -dir tinycrawl >> -depth >> 2 >> crawl started in: tinycrawl >> rootUrlDir = url >> threads = 10 >> depth = 2 >> Injector: starting >> Injector: crawlDb: tinycrawl/crawldb >> Injector: urlDir: url >> Injector: Converting injected urls to crawl db entries. >> Injector: Merging injected urls into crawl db. >> Injector: done >> Generator: Selecting best-scoring urls due for fetch. >> Generator: starting >> Generator: segment: tinycrawl/segments/20100120130316 >> Generator: filtering: false >> Generator: topN: 2147483647 >> Generator: jobtracker is 'local', generating exactly one partition. >> Generator: Partitioning selected urls by host, for politeness. >> Generator: done. >> Fetcher: starting >> Fetcher: segment: tinycrawl/segments/20100120130316 >> Fetcher: threads: 10 >> fetching http://www.mywebsite.us/ >> fetch of http://www.mywebsite.us/ failed with: >> java.lang.RuntimeException: >> Agent name not configured! > > You need to fix nutch config file as per README. > > > > >> Fetcher: done >> CrawlDb update: starting >> CrawlDb update: db: tinycrawl/crawldb >> CrawlDb update: segments: [tinycrawl/segments/20100120130316] >> CrawlDb update: additions allowed: true >> CrawlDb update: URL normalizing: true >> CrawlDb update: URL filtering: true >> CrawlDb update: Merging segment data into db. >> CrawlDb update: done >> Generator: Selecting best-scoring urls due for fetch. >> Generator: starting >> Generator: segment: tinycrawl/segments/20100120130323 >> Generator: filtering: false >> Generator: topN: 2147483647 >> Generator: jobtracker is 'local', generating exactly one partition. >> Generator: Partitioning selected urls by host, for politeness. >> Generator: done. >> Fetcher: starting >> Fetcher: segment: tinycrawl/segments/20100120130323 >> Fetcher: threads: 10 >> fetching http://www.mywebsite.us/ >> fetch of http://www.mywebsite.us/ failed with: >> java.lang.RuntimeException: >> Agent name not configured! >> Fetcher: done >> CrawlDb update: starting >> CrawlDb update: db: tinycrawl/crawldb >> CrawlDb update: segments: [tinycrawl/segments/20100120130323] >> CrawlDb update: additions allowed: true >> CrawlDb update: URL normalizing: true >> CrawlDb update: URL filtering: true >> CrawlDb update: Merging segment data into db. >> CrawlDb update: done >> LinkDb: starting >> LinkDb: linkdb: tinycrawl/linkdb >> LinkDb: URL normalize: true >> LinkDb: URL filter: true >> LinkDb: adding segment: tinycrawl/segments/20100120130323 >> LinkDb: adding segment: tinycrawl/segments/20100120130316 >> LinkDb: done >> Indexer: starting >> Indexer: linkdb: tinycrawl/linkdb >> Indexer: adding segment: tinycrawl/segments/20100120130323 >> Indexer: adding segment: tinycrawl/segments/20100120130316 >> Optimizing index. >> Indexer: done >> Dedup: starting >> Dedup: adding indexes in: tinycrawl/indexes >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) >> at >> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439) >> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) >> >> LogFile gives >> java.lang.ArrayIndexOutOfBoundsException: -1 >> at >> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> at >> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176) >> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126) >> -- >> View this message in context: >> http://old.nabble.com/Tried-to-run-Crawl-with-depth-of-only-2-and-getting-IOException-tp27246959p27246959.html >> Sent from the Nutch - Dev mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Tried-to-run-Crawl-with-depth-of-only-2-and-getting-IOException-tp27246959p27257065.html Sent from the Nutch - Dev mailing list archive at Nabble.com.