Thanks for the info, I'll move this to the dev list. The information I have shown is from the logs/hadoop.log file. That is what's strange, the log file is basically empty other than the start and finish of the job, no exceptions, no errors, almost no information... This is with the default log4j.properties.
Re-running the readdb stats show no changes... WebTable statistics start Statistics for WebTable: min score: 1.0 max score: 1.0 TOTAL urls: 2894 status 0 (null): 2894 avg score: 1.0 WebTable statistics: done min score: 1.0 retry 0: 2894 max score: 1.0 TOTAL urls: 2894 status 0 (null): 2894 avg score: 1.0 -----Original Message----- From: Andrzej Bialecki [mailto:[email protected]] Sent: Thursday, December 16, 2010 11:36 PM To: [email protected] Subject: Re: Does Nutch 2.0 in good enough shape to test? On 12/17/10 2:08 AM, brad wrote: > To Generate, I use the following: > nutch generate -all -topN 100000 > > crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for > fetch. > crawl.GeneratorJob - GeneratorJob: starting crawl.GeneratorJob - > GeneratorJob: filtering: true crawl.GeneratorJob - GeneratorJob: topN: > 100000 crawl.GeneratorJob - GeneratorJob: done crawl.GeneratorJob - > GeneratorJob: generated batch id: 1292541893-1499060629 > > No other log information is provided... > Unlike the old way which include log items like: > Generator: starting at 2010-10-08 23:16:02 If in doubt you should check the logs/hadoop.log - if there were any exceptions they should be reported there. > Same type of issue occurs with Fetch: > nutch fetch -all -threads 100 -parse > > The log files show: > fetcher.FetcherJob - FetcherJob: starting fetcher.FetcherJob - > FetcherJob : timelimit set for : -1 fetcher.FetcherJob - FetcherJob: > threads: 10 fetcher.FetcherJob - FetcherJob: parsing: false > fetcher.FetcherJob - FetcherJob: resuming: false fetcher.FetcherJob - > FetcherJob: fetching all fetcher.FetcherJob - FetcherJob: done Again, there should be some data in the log. Also, at this point you can re-run readdb and check if the statistics is changed. > So, the question is, is Nutch 2.0 ready to beta test? or am I doing > something very wrong? I guess it could be a config error - basic usage should just work... > So what am I missing? I don't know, we need more information. BTW, dev@ list may be more appropriate for this discussion. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

