i follow this link "http://wiki.apache.org/nutch/NutchHadoopTutorial" so i think it's not about the conf/crawl-urlfilter.txt file when i use this command "bin/nutch crawl urls -dir crawled -depth 3" again it shows : Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawled/segments/25501221110712 Generator: filtering: false Generator: topN: 2147483647 Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: crawled/segments/25501221110712 Fetcher: done CrawlDb update: starting CrawlDb update: db: crawled/crawldb CrawlDb update: segments: [crawled/segments/25501221110712] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawled/segments/25501221110908 Generator: filtering: false Generator: topN: 2147483647 Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: crawled/segments/25501221110908 Fetcher: done CrawlDb update: starting CrawlDb update: db: crawled/crawldb CrawlDb update: segments: [crawled/segments/25501221110908] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: done LinkDb: starting LinkDb: linkdb: crawled/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: /user/nutch/crawled/segments/25501221110519 LinkDb: adding segment: /user/nutch/crawled/segments/25501221110712 LinkDb: adding segment: /user/nutch/crawled/segments/25501221110908 LinkDb: done Indexer: starting Indexer: linkdb: crawled/linkdb Indexer: adding segment: /user/nutch/crawled/segments/25501221110519 Indexer: adding segment: /user/nutch/crawled/segments/25501221110712 Indexer: adding segment: /user/nutch/crawled/segments/25501221110908 Indexer: done Dedup: starting Dedup: adding indexes in: crawled/indexes task_0017_m_000000_0: log4j:ERROR Either File or DatePattern options are not set for appender [DRFA]. task_0017_m_000001_0: log4j:ERROR setFile(null,true) call failed. task_0017_m_000001_0: java.io.FileNotFoundException: /nutch/search/logs (Is a directory) task_0017_m_000001_0: at java.io.FileOutputStream.openAppend(Native Method) task_0017_m_000001_0: at java.io.FileOutputStream.<init>(FileOutputStream.java:177) task_0017_m_000001_0: at java.io.FileOutputStream.<init>(FileOutputStream.java:102) task_0017_m_000001_0: at org.apache.log4j.FileAppender.setFile(FileAppender.java:289) task_0017_m_000001_0: at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163) task_0017_m_000001_0: at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215) task_0017_m_000001_0: at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256) task_0017_m_000001_0: at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132) task_0017_m_000001_0: at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96) task_0017_m_000001_0: at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654) task_0017_m_000001_0: at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612) task_0017_m_000001_0: at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509) task_0017_m_000001_0: at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415) task_0017_m_000001_0: at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441) task_0017_m_000001_0: at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468) task_0017_m_000001_0: at org.apache.log4j.LogManager.<clinit>(LogManager.java:122) task_0017_m_000001_0: at org.apache.log4j.Logger.getLogger(Logger.java:104) task_0017_m_000001_0: at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229) task_0017_m_000001_0: at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65) task_0017_m_000001_0: at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) task_0017_m_000001_0: at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) task_0017_m_000001_0: at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) task_0017_m_000001_0: at java.lang.reflect.Constructor.newInstance(Constructor.java:513) task_0017_m_000001_0: at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529) task_0017_m_000001_0: at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235) task_0017_m_000001_0: at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370) task_0017_m_000001_0: at org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:82) task_0017_m_000001_0: at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1423) task_0017_m_000001_0: log4j:ERROR Either File or DatePattern options are not set for appender [DRFA]. task_0017_m_000000_1: log4j:ERROR setFile(null,true) call failed. task_0017_m_000000_1: java.io.FileNotFoundException: /nutch/search/logs (Is a directory) task_0017_m_000000_1: at java.io.FileOutputStream.openAppend(Native Method) task_0017_m_000000_1: at java.io.FileOutputStream.<init>(FileOutputStream.java:177) task_0017_m_000000_1: at java.io.FileOutputStream.<init>(FileOutputStream.java:102) task_0017_m_000000_1: at org.apache.log4j.FileAppender.setFile(FileAppender.java:289) task_0017_m_000000_1: at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163) task_0017_m_000000_1: at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215) task_0017_m_000000_1: at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256) task_0017_m_000000_1: at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132) task_0017_m_000000_1: at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96) task_0017_m_000000_1: at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654) task_0017_m_000000_1: at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612) task_0017_m_000000_1: at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509) task_0017_m_000000_1: at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415) task_0017_m_000000_1: at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441) task_0017_m_000000_1: at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468) task_0017_m_000000_1: at org.apache.log4j.LogManager.<clinit>(LogManager.java:122) task_0017_m_000000_1: at org.apache.log4j.Logger.getLogger(Logger.java:104) task_0017_m_000000_1: at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229) task_0017_m_000000_1: at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65) task_0017_m_000000_1: at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) task_0017_m_000000_1: at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) task_0017_m_000000_1: at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) task_0017_m_000000_1: at java.lang.reflect.Constructor.newInstance(Constructor.java:513) task_0017_m_000000_1: at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529) task_0017_m_000000_1: at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235) task_0017_m_000000_3: at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370) task_0017_m_000000_3: at org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:82) task_0017_m_000000_3: at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1423) task_0017_m_000000_3: log4j:ERROR Either File or DatePattern options are not set for appender [DRFA]. Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439) at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
i don't know what happen? pvvpr wrote: > > I think you need to check the conf/crawl-urlfilter.txt file > > On Thursday 20 December 2007 04:55, jibjoice wrote: >> please, help me to solve it >> >> jibjoice wrote: >> > where i should solve this? why it generated 0 records? >> > >> > pvvpr wrote: >> >> basically your indexes are empty since no URLs were generated and >> >> fetched. See >> >> this, >> >> >> >>> > - Generator: 0 records selected for fetching, exiting ... >> >>> > - Stopping at depth=0 - no more URLs to fetch. >> >>> > - No URLs to fetch - check your seed list and URL filters. >> >>> > - crawl finished: crawled >> >> >> >> when no pages are indexed, dedup throws Exception >> >> >> >> On Tuesday 18 December 2007 21:33, jibjoice wrote: >> >>> i can't solve it now, pls help me >> >>> >> >>> jibjoice wrote: >> >>> > i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch >> >>> > crawl urls -dir crawled -depth 3" have error : >> >>> > >> >>> > - crawl started in: crawled >> >>> > - rootUrlDir = input >> >>> > - threads = 10 >> >>> > - depth = 3 >> >>> > - Injector: starting >> >>> > - Injector: crawlDb: crawled/crawldb >> >>> > - Injector: urlDir: input >> >>> > - Injector: Converting injected urls to crawl db entries. >> >>> > - Total input paths to process : 1 >> >>> > - Running job: job_0001 >> >>> > - map 0% reduce 0% >> >>> > - map 100% reduce 0% >> >>> > - map 100% reduce 100% >> >>> > - Job complete: job_0001 >> >>> > - Counters: 6 >> >>> > - Map-Reduce Framework >> >>> > - Map input records=3 >> >>> > - Map output records=1 >> >>> > - Map input bytes=22 >> >>> > - Map output bytes=52 >> >>> > - Reduce input records=1 >> >>> > - Reduce output records=1 >> >>> > - Injector: Merging injected urls into crawl db. >> >>> > - Total input paths to process : 2 >> >>> > - Running job: job_0002 >> >>> > - map 0% reduce 0% >> >>> > - map 100% reduce 0% >> >>> > - map 100% reduce 58% >> >>> > - map 100% reduce 100% >> >>> > - Job complete: job_0002 >> >>> > - Counters: 6 >> >>> > - Map-Reduce Framework >> >>> > - Map input records=3 >> >>> > - Map output records=1 >> >>> > - Map input bytes=60 >> >>> > - Map output bytes=52 >> >>> > - Reduce input records=1 >> >>> > - Reduce output records=1 >> >>> > - Injector: done >> >>> > - Generator: Selecting best-scoring urls due for fetch. >> >>> > - Generator: starting >> >>> > - Generator: segment: crawled/segments/25501213164325 >> >>> > - Generator: filtering: false >> >>> > - Generator: topN: 2147483647 >> >>> > - Total input paths to process : 2 >> >>> > - Running job: job_0003 >> >>> > - map 0% reduce 0% >> >>> > - map 100% reduce 0% >> >>> > - map 100% reduce 100% >> >>> > - Job complete: job_0003 >> >>> > - Counters: 6 >> >>> > - Map-Reduce Framework >> >>> > - Map input records=3 >> >>> > - Map output records=1 >> >>> > - Map input bytes=59 >> >>> > - Map output bytes=77 >> >>> > - Reduce input records=1 >> >>> > - Reduce output records=1 >> >>> > - Generator: 0 records selected for fetching, exiting ... >> >>> > - Stopping at depth=0 - no more URLs to fetch. >> >>> > - No URLs to fetch - check your seed list and URL filters. >> >>> > - crawl finished: crawled >> >>> > >> >>> > but sometime i crawl some url it has error indexes time that >> >>> > >> >>> > - Indexer: done >> >>> > - Dedup: starting >> >>> > - Dedup: adding indexes in: crawled/indexes >> >>> > - Total input paths to process : 2 >> >>> > - Running job: job_0025 >> >>> > - map 0% reduce 0% >> >>> > - Task Id : task_0025_m_000001_0, Status : FAILED >> >>> > task_0025_m_000001_0: - Error running child >> >>> > task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000001_0: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000001_0: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000001_0: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000001_0: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000001_0: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_000000_0, Status : FAILED >> >>> > task_0025_m_000000_0: - Error running child >> >>> > task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000000_0: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000000_0: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000000_0: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000000_0: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000000_0: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_000000_1, Status : FAILED >> >>> > task_0025_m_000000_1: - Error running child >> >>> > task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000000_1: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000000_1: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000000_1: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000000_1: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000000_1: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_000001_1, Status : FAILED >> >>> > task_0025_m_000001_1: - Error running child >> >>> > task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000001_1: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000001_1: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000001_1: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000001_1: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000001_1: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_000001_2, Status : FAILED >> >>> > task_0025_m_000001_2: - Error running child >> >>> > task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000001_2: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000001_2: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000001_2: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000001_2: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000001_2: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_000000_2, Status : FAILED >> >>> > task_0025_m_000000_2: - Error running child >> >>> > task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000000_2: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000000_2: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000000_2: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000000_2: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000000_2: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - map 100% reduce 100% >> >>> > - Task Id : task_0025_m_000001_3, Status : FAILED >> >>> > task_0025_m_000001_3: - Error running child >> >>> > task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000001_3: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000001_3: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000001_3: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000001_3: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000001_3: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > - Task Id : task_0025_m_000000_3, Status : FAILED >> >>> > task_0025_m_000000_3: - Error running child >> >>> > task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1 >> >>> > task_0025_m_000000_3: at >> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113) >> >>> > task_0025_m_000000_3: at >> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade >> >>> > r.next(DeleteDuplicates.java:176) >> >>> > task_0025_m_000000_3: at >> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157) >> >>> > task_0025_m_000000_3: at >> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) >> >>> > task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run >> >>> > (MapTask.java:175) >> >>> > task_0025_m_000000_3: at >> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main >> >>> > (TaskTracker.java:1445) >> >>> > Exception in thread "main" java.io.IOException: Job failed! >> >>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) >> >>> > at org.apache.nutch.indexer.DeleteDuplicates.dedup >> >>> > (DeleteDuplicates.java:439) >> >>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) >> >>> > >> >>> > how i solve it? > > > > -- View this message in context: http://www.nabble.com/Nutch-crawl-problem-tp14327978p14450181.html Sent from the Hadoop Users mailing list archive at Nabble.com.