please, help me to solve it jibjoice wrote: > > where i should solve this? why it generated 0 records? > > > pvvpr wrote: >> >> basically your indexes are empty since no URLs were generated and >> fetched. See >> this, >> >>> > - Generator: 0 records selected for fetching, exiting ... >>> > - Stopping at depth=0 - no more URLs to fetch. >>> > - No URLs to fetch - check your seed list and URL filters. >>> > - crawl finished: crawled >> >> >> when no pages are indexed, dedup throws Exception >> >> >> On Tuesday 18 December 2007 21:33, jibjoice wrote: >>> i can't solve it now, pls help me >>> >>> jibjoice wrote: >>> > i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl >>> > urls -dir crawled -depth 3" have error : >>> > >>> > - crawl started in: crawled >>> > - rootUrlDir = input >>> > - threads = 10 >>> > - depth = 3 >>> > - Injector: starting >>> > - Injector: crawlDb: crawled/crawldb >>> > - Injector: urlDir: input >>> > - Injector: Converting injected urls to crawl db entries. >>> > - Total input paths to process : 1 >>> > - Running job: job_0001 >>> > - map 0% reduce 0% >>> > - map 100% reduce 0% >>> > - map 100% reduce 100% >>> > - Job complete: job_0001 >>> > - Counters: 6 >>> > - Map-Reduce Framework >>> > - Map input records=3 >>> > - Map output records=1 >>> > - Map input bytes=22 >>> > - Map output bytes=52 >>> > - Reduce input records=1 >>> > - Reduce output records=1 >>> > - Injector: Merging injected urls into crawl db. >>> > - Total input paths to process : 2 >>> > - Running job: job_0002 >>> > - map 0% reduce 0% >>> > - map 100% reduce 0% >>> > - map 100% reduce 58% >>> > - map 100% reduce 100% >>> > - Job complete: job_0002 >>> > - Counters: 6 >>> > - Map-Reduce Framework >>> > - Map input records=3 >>> > - Map output records=1 >>> > - Map input bytes=60 >>> > - Map output bytes=52 >>> > - Reduce input records=1 >>> > - Reduce output records=1 >>> > - Injector: done >>> > - Generator: Selecting best-scoring urls due for fetch. >>> > - Generator: starting >>> > - Generator: segment: crawled/segments/25501213164325 >>> > - Generator: filtering: false >>> > - Generator: topN: 2147483647 >>> > - Total input paths to process : 2 >>> > - Running job: job_0003 >>> > - map 0% reduce 0% >>> > - map 100% reduce 0% >>> > - map 100% reduce 100% >>> > - Job complete: job_0003 >>> > - Counters: 6 >>> > - Map-Reduce Framework >>> > - Map input records=3 >>> > - Map output records=1 >>> > - Map input bytes=59 >>> > - Map output bytes=77 >>> > - Reduce input records=1 >>> > - Reduce output records=1 >>> > - Generator: 0 records selected for fetching, exiting ... >>> > - Stopping at depth=0 - no more URLs to fetch. >>> > - No URLs to fetch - check your seed list and URL filters. >>> > - crawl finished: crawled >>> > >>> > but sometime i crawl some url it has error indexes time that >>> > >>> > - Indexer: done >>> > - Dedup: starting >>> > - Dedup: adding indexes in: crawled/indexes >>> > - Total input paths to process : 2 >>> > - Running job: job_0025 >>> > - map 0% reduce 0% >>> > - Task Id : task_0025_m_000001_0, Status : FAILED >>> > task_0025_m_000001_0: - Error running child >>> > task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000001_0: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000001_0: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000001_0: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000001_0: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000001_0: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - Task Id : task_0025_m_000000_0, Status : FAILED >>> > task_0025_m_000000_0: - Error running child >>> > task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000000_0: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000000_0: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000000_0: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000000_0: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000000_0: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - Task Id : task_0025_m_000000_1, Status : FAILED >>> > task_0025_m_000000_1: - Error running child >>> > task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000000_1: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000000_1: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000000_1: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000000_1: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000000_1: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - Task Id : task_0025_m_000001_1, Status : FAILED >>> > task_0025_m_000001_1: - Error running child >>> > task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000001_1: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000001_1: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000001_1: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000001_1: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000001_1: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - Task Id : task_0025_m_000001_2, Status : FAILED >>> > task_0025_m_000001_2: - Error running child >>> > task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000001_2: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000001_2: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000001_2: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000001_2: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000001_2: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - Task Id : task_0025_m_000000_2, Status : FAILED >>> > task_0025_m_000000_2: - Error running child >>> > task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000000_2: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000000_2: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000000_2: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000000_2: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000000_2: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - map 100% reduce 100% >>> > - Task Id : task_0025_m_000001_3, Status : FAILED >>> > task_0025_m_000001_3: - Error running child >>> > task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000001_3: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000001_3: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000001_3: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000001_3: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000001_3: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > - Task Id : task_0025_m_000000_3, Status : FAILED >>> > task_0025_m_000000_3: - Error running child >>> > task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1 >>> > task_0025_m_000000_3: at >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >>> > task_0025_m_000000_3: at >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >>> > r.next(DeleteDuplicates.java:176) >>> > task_0025_m_000000_3: at >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >>> > task_0025_m_000000_3: at >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >>> > task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run >>> > (MapTask.java:175) >>> > task_0025_m_000000_3: at >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >>> > (TaskTracker.java:1445) >>> > Exception in thread "main" java.io.IOException: Job failed! >>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) >>> > at org.apache.nutch.indexer.DeleteDuplicates.dedup >>> > (DeleteDuplicates.java:439) >>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) >>> > >>> > how i solve it? >> >> >> >> > >
-- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14433510.html Sent from the Hadoop Users mailing list archive at Nabble.com.