i follow this link "http://wiki.apache.org/nutch/NutchHadoopTutorial"; so i
think it's not about the conf/crawl-urlfilter.txt file when i use this
command "bin/nutch crawl urls -dir crawled -depth 3" again it shows :
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/25501221110712
Generator: filtering: false
Generator: topN: 2147483647
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawled/segments/25501221110712
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawled/crawldb
CrawlDb update: segments: [crawled/segments/25501221110712]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/25501221110908
Generator: filtering: false
Generator: topN: 2147483647
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawled/segments/25501221110908
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawled/crawldb
CrawlDb update: segments: [crawled/segments/25501221110908]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawled/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: /user/nutch/crawled/segments/25501221110519
LinkDb: adding segment: /user/nutch/crawled/segments/25501221110712
LinkDb: adding segment: /user/nutch/crawled/segments/25501221110908
LinkDb: done
Indexer: starting
Indexer: linkdb: crawled/linkdb
Indexer: adding segment: /user/nutch/crawled/segments/25501221110519
Indexer: adding segment: /user/nutch/crawled/segments/25501221110712
Indexer: adding segment: /user/nutch/crawled/segments/25501221110908
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawled/indexes
task_0017_m_000000_0: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
task_0017_m_000001_0: log4j:ERROR setFile(null,true) call failed.
task_0017_m_000001_0: java.io.FileNotFoundException: /nutch/search/logs (Is
a directory)
task_0017_m_000001_0:   at java.io.FileOutputStream.openAppend(Native
Method)
task_0017_m_000001_0:   at
java.io.FileOutputStream.<init>(FileOutputStream.java:177)
task_0017_m_000001_0:   at
java.io.FileOutputStream.<init>(FileOutputStream.java:102)
task_0017_m_000001_0:   at
org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
task_0017_m_000001_0:   at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
task_0017_m_000001_0:   at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
task_0017_m_000001_0:   at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
task_0017_m_000001_0:   at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
task_0017_m_000001_0:   at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
task_0017_m_000001_0:   at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
task_0017_m_000001_0:   at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
task_0017_m_000001_0:   at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
task_0017_m_000001_0:   at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
task_0017_m_000001_0:   at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
task_0017_m_000001_0:   at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
task_0017_m_000001_0:   at
org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
task_0017_m_000001_0:   at
org.apache.log4j.Logger.getLogger(Logger.java:104)
task_0017_m_000001_0:   at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
task_0017_m_000001_0:   at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
task_0017_m_000001_0:   at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
task_0017_m_000001_0:   at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
task_0017_m_000001_0:   at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
task_0017_m_000001_0:   at
java.lang.reflect.Constructor.newInstance(Constructor.java:513)
task_0017_m_000001_0:   at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
task_0017_m_000001_0:   at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
task_0017_m_000001_0:   at
org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
task_0017_m_000001_0:   at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:82)
task_0017_m_000001_0:   at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1423)
task_0017_m_000001_0: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
task_0017_m_000000_1: log4j:ERROR setFile(null,true) call failed.
task_0017_m_000000_1: java.io.FileNotFoundException: /nutch/search/logs (Is
a directory)
task_0017_m_000000_1:   at java.io.FileOutputStream.openAppend(Native
Method)
task_0017_m_000000_1:   at
java.io.FileOutputStream.<init>(FileOutputStream.java:177)
task_0017_m_000000_1:   at
java.io.FileOutputStream.<init>(FileOutputStream.java:102)
task_0017_m_000000_1:   at
org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
task_0017_m_000000_1:   at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
task_0017_m_000000_1:   at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
task_0017_m_000000_1:   at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
task_0017_m_000000_1:   at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
task_0017_m_000000_1:   at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
task_0017_m_000000_1:   at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
task_0017_m_000000_1:   at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
task_0017_m_000000_1:   at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
task_0017_m_000000_1:   at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
task_0017_m_000000_1:   at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
task_0017_m_000000_1:   at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
task_0017_m_000000_1:   at
org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
task_0017_m_000000_1:   at
org.apache.log4j.Logger.getLogger(Logger.java:104)
task_0017_m_000000_1:   at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
task_0017_m_000000_1:   at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
task_0017_m_000000_1:   at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
task_0017_m_000000_1:   at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
task_0017_m_000000_1:   at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
task_0017_m_000000_1:   at
java.lang.reflect.Constructor.newInstance(Constructor.java:513)
task_0017_m_000000_1:   at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
task_0017_m_000000_1:   at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
task_0017_m_000000_3:   at
org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
task_0017_m_000000_3:   at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:82)
task_0017_m_000000_3:   at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1423)
task_0017_m_000000_3: log4j:ERROR Either File or DatePattern options are not
set for appender [DRFA].
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at
org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)

i don't know what happen?


pvvpr wrote:
> 
> I think you need to check the conf/crawl-urlfilter.txt file
> 
> On Thursday 20 December 2007 04:55, jibjoice wrote:
>> please, help me to solve it
>>
>> jibjoice wrote:
>> > where i should solve this? why it generated 0 records?
>> >
>> > pvvpr wrote:
>> >> basically your indexes are empty since no URLs were generated and
>> >> fetched. See
>> >> this,
>> >>
>> >>> > - Generator: 0 records selected for fetching, exiting ...
>> >>> > - Stopping at depth=0 - no more URLs to fetch.
>> >>> > - No URLs to fetch - check your seed list and URL filters.
>> >>> > - crawl finished: crawled
>> >>
>> >> when no pages are indexed, dedup throws Exception
>> >>
>> >> On Tuesday 18 December 2007 21:33, jibjoice wrote:
>> >>> i can't solve it now, pls help me
>> >>>
>> >>> jibjoice wrote:
>> >>> > i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch
>> >>> > crawl urls -dir crawled -depth 3" have error :
>> >>> >
>> >>> > - crawl started in: crawled
>> >>> > - rootUrlDir = input
>> >>> > - threads = 10
>> >>> > - depth = 3
>> >>> > - Injector: starting
>> >>> > - Injector: crawlDb: crawled/crawldb
>> >>> > - Injector: urlDir: input
>> >>> > - Injector: Converting injected urls to crawl db entries.
>> >>> > - Total input paths to process : 1
>> >>> > - Running job: job_0001
>> >>> > - map 0% reduce 0%
>> >>> > - map 100% reduce 0%
>> >>> > - map 100% reduce 100%
>> >>> > - Job complete: job_0001
>> >>> > - Counters: 6
>> >>> > - Map-Reduce Framework
>> >>> > - Map input records=3
>> >>> > - Map output records=1
>> >>> > - Map input bytes=22
>> >>> > - Map output bytes=52
>> >>> > - Reduce input records=1
>> >>> > - Reduce output records=1
>> >>> > - Injector: Merging injected urls into crawl db.
>> >>> > - Total input paths to process : 2
>> >>> > - Running job: job_0002
>> >>> > - map 0% reduce 0%
>> >>> > - map 100% reduce 0%
>> >>> > - map 100% reduce 58%
>> >>> > - map 100% reduce 100%
>> >>> > - Job complete: job_0002
>> >>> > - Counters: 6
>> >>> > - Map-Reduce Framework
>> >>> > - Map input records=3
>> >>> > - Map output records=1
>> >>> > - Map input bytes=60
>> >>> > - Map output bytes=52
>> >>> > - Reduce input records=1
>> >>> > - Reduce output records=1
>> >>> > - Injector: done
>> >>> > - Generator: Selecting best-scoring urls due for fetch.
>> >>> > - Generator: starting
>> >>> > - Generator: segment: crawled/segments/25501213164325
>> >>> > - Generator: filtering: false
>> >>> > - Generator: topN: 2147483647
>> >>> > - Total input paths to process : 2
>> >>> > - Running job: job_0003
>> >>> > - map 0% reduce 0%
>> >>> > - map 100% reduce 0%
>> >>> > - map 100% reduce 100%
>> >>> > - Job complete: job_0003
>> >>> > - Counters: 6
>> >>> > - Map-Reduce Framework
>> >>> > - Map input records=3
>> >>> > - Map output records=1
>> >>> > - Map input bytes=59
>> >>> > - Map output bytes=77
>> >>> > - Reduce input records=1
>> >>> > - Reduce output records=1
>> >>> > - Generator: 0 records selected for fetching, exiting ...
>> >>> > - Stopping at depth=0 - no more URLs to fetch.
>> >>> > - No URLs to fetch - check your seed list and URL filters.
>> >>> > - crawl finished: crawled
>> >>> >
>> >>> > but sometime i crawl some url it has error indexes time that
>> >>> >
>> >>> > - Indexer: done
>> >>> > - Dedup: starting
>> >>> > - Dedup: adding indexes in: crawled/indexes
>> >>> > - Total input paths to process : 2
>> >>> > - Running job: job_0025
>> >>> > - map 0% reduce 0%
>> >>> > - Task Id : task_0025_m_000001_0, Status : FAILED
>> >>> > task_0025_m_000001_0: - Error running child
>> >>> > task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000001_0: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000001_0: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000001_0: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000001_0: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000001_0: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_000000_0, Status : FAILED
>> >>> > task_0025_m_000000_0: - Error running child
>> >>> > task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000000_0: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000000_0: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000000_0: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000000_0: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000000_0: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_000000_1, Status : FAILED
>> >>> > task_0025_m_000000_1: - Error running child
>> >>> > task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000000_1: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000000_1: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000000_1: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000000_1: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000000_1: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_000001_1, Status : FAILED
>> >>> > task_0025_m_000001_1: - Error running child
>> >>> > task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000001_1: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000001_1: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000001_1: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000001_1: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000001_1: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_000001_2, Status : FAILED
>> >>> > task_0025_m_000001_2: - Error running child
>> >>> > task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000001_2: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000001_2: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000001_2: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000001_2: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000001_2: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_000000_2, Status : FAILED
>> >>> > task_0025_m_000000_2: - Error running child
>> >>> > task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000000_2: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000000_2: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000000_2: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000000_2: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000000_2: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - map 100% reduce 100%
>> >>> > - Task Id : task_0025_m_000001_3, Status : FAILED
>> >>> > task_0025_m_000001_3: - Error running child
>> >>> > task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000001_3: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000001_3: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000001_3: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000001_3: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000001_3: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > - Task Id : task_0025_m_000000_3, Status : FAILED
>> >>> > task_0025_m_000000_3: - Error running child
>> >>> > task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1
>> >>> > task_0025_m_000000_3: at
>> >>> > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
>> >>> > task_0025_m_000000_3: at
>> >>> > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
>> >>> > r.next(DeleteDuplicates.java:176)
>> >>> > task_0025_m_000000_3: at
>> >>> > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>> >>> > task_0025_m_000000_3: at
>> >>> > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>> >>> > task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run
>> >>> > (MapTask.java:175)
>> >>> > task_0025_m_000000_3: at
>> >>> > org.apache.hadoop.mapred.TaskTracker$Child.main
>> >>> > (TaskTracker.java:1445)
>> >>> > Exception in thread "main" java.io.IOException: Job failed!
>> >>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>> >>> > at org.apache.nutch.indexer.DeleteDuplicates.dedup
>> >>> > (DeleteDuplicates.java:439)
>> >>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>> >>> >
>> >>> > how i solve it?
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Nutch-crawl-problem-tp14327978p14450181.html
Sent from the Hadoop Users mailing list archive at Nabble.com.

Reply via email to