i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl urls -dir crawled -depth 3" have error :
- crawl started in: crawled - rootUrlDir = input - threads = 10 - depth = 3 - Injector: starting - Injector: crawlDb: crawled/crawldb - Injector: urlDir: input - Injector: Converting injected urls to crawl db entries. - Total input paths to process : 1 - Running job: job_0001 - map 0% reduce 0% - map 100% reduce 0% - map 100% reduce 100% - Job complete: job_0001 - Counters: 6 - Map-Reduce Framework - Map input records=3 - Map output records=1 - Map input bytes=22 - Map output bytes=52 - Reduce input records=1 - Reduce output records=1 - Injector: Merging injected urls into crawl db. - Total input paths to process : 2 - Running job: job_0002 - map 0% reduce 0% - map 100% reduce 0% - map 100% reduce 58% - map 100% reduce 100% - Job complete: job_0002 - Counters: 6 - Map-Reduce Framework - Map input records=3 - Map output records=1 - Map input bytes=60 - Map output bytes=52 - Reduce input records=1 - Reduce output records=1 - Injector: done - Generator: Selecting best-scoring urls due for fetch. - Generator: starting - Generator: segment: crawled/segments/25501213164325 - Generator: filtering: false - Generator: topN: 2147483647 - Total input paths to process : 2 - Running job: job_0003 - map 0% reduce 0% - map 100% reduce 0% - map 100% reduce 100% - Job complete: job_0003 - Counters: 6 - Map-Reduce Framework - Map input records=3 - Map output records=1 - Map input bytes=59 - Map output bytes=77 - Reduce input records=1 - Reduce output records=1 - Generator: 0 records selected for fetching, exiting ... - Stopping at depth=0 - no more URLs to fetch. - No URLs to fetch - check your seed list and URL filters. - crawl finished: crawled but sometime i crawl some url it has error indexes time that - Indexer: done - Dedup: starting - Dedup: adding indexes in: crawled/indexes - Total input paths to process : 2 - Running job: job_0025 - map 0% reduce 0% - Task Id : task_0025_m_000001_0, Status : FAILED task_0025_m_000001_0: - Error running child task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000001_0: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000001_0: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000001_0: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000001_0: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - Task Id : task_0025_m_000000_0, Status : FAILED task_0025_m_000000_0: - Error running child task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000000_0: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000000_0: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000000_0: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000000_0: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - Task Id : task_0025_m_000000_1, Status : FAILED task_0025_m_000000_1: - Error running child task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000000_1: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000000_1: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000000_1: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000000_1: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - Task Id : task_0025_m_000001_1, Status : FAILED task_0025_m_000001_1: - Error running child task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000001_1: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000001_1: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000001_1: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000001_1: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - Task Id : task_0025_m_000001_2, Status : FAILED task_0025_m_000001_2: - Error running child task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000001_2: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000001_2: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000001_2: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000001_2: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - Task Id : task_0025_m_000000_2, Status : FAILED task_0025_m_000000_2: - Error running child task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000000_2: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000000_2: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000000_2: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000000_2: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - map 100% reduce 100% - Task Id : task_0025_m_000001_3, Status : FAILED task_0025_m_000001_3: - Error running child task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000001_3: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000001_3: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000001_3: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000001_3: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) - Task Id : task_0025_m_000000_3, Status : FAILED task_0025_m_000000_3: - Error running child task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1 task_0025_m_000000_3: at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) task_0025_m_000000_3: at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade r.next(DeleteDuplicates.java:176) task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) task_0025_m_000000_3: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run (MapTask.java:175) task_0025_m_000000_3: at org.apache.hadoop.mapred.TaskTracker$Child.main (TaskTracker.java:1445) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) at org.apache.nutch.indexer.DeleteDuplicates.dedup (DeleteDuplicates.java:439) at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) how i solve it? -- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14327978.html Sent from the Hadoop Users mailing list archive at Nabble.com.