Thanks for the info, I'll move this to the dev list.

The information I have shown is from the logs/hadoop.log file.  That is
what's strange, the log file is basically empty other than the start and
finish of the job, no exceptions, no errors, almost no information... This
is with the default log4j.properties.

Re-running the readdb stats show no changes...

WebTable statistics start
Statistics for WebTable:
min score:      1.0
max score:      1.0
TOTAL urls:     2894
status 0 (null):        2894
avg score:      1.0
WebTable statistics: done
min score:      1.0
retry 0:        2894
max score:      1.0
TOTAL urls:     2894
status 0 (null):        2894
avg score:      1.0
 

-----Original Message-----
From: Andrzej Bialecki [mailto:[email protected]] 
Sent: Thursday, December 16, 2010 11:36 PM
To: [email protected]
Subject: Re: Does Nutch 2.0 in good enough shape to test?

On 12/17/10 2:08 AM, brad wrote:

> To Generate, I use the following:
> nutch generate -all -topN 100000
>
> crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for 
> fetch.
> crawl.GeneratorJob - GeneratorJob: starting crawl.GeneratorJob - 
> GeneratorJob: filtering: true crawl.GeneratorJob - GeneratorJob: topN: 
> 100000 crawl.GeneratorJob - GeneratorJob: done crawl.GeneratorJob - 
> GeneratorJob: generated batch id: 1292541893-1499060629
>
> No other log information is provided...
> Unlike the old way which include log items like:
> Generator: starting at 2010-10-08 23:16:02

If in doubt you should check the logs/hadoop.log - if there were any
exceptions they should be reported there.

> Same type of issue occurs with Fetch:
> nutch fetch -all -threads 100 -parse
>
> The log files show:
> fetcher.FetcherJob - FetcherJob: starting fetcher.FetcherJob - 
> FetcherJob : timelimit set for : -1 fetcher.FetcherJob - FetcherJob: 
> threads: 10 fetcher.FetcherJob - FetcherJob: parsing: false 
> fetcher.FetcherJob - FetcherJob: resuming: false fetcher.FetcherJob - 
> FetcherJob: fetching all fetcher.FetcherJob - FetcherJob: done

Again, there should be some data in the log. Also, at this point you can
re-run readdb and check if the statistics is changed.


> So, the question is, is Nutch 2.0 ready to beta test? or am I doing
> something very wrong?

I guess it could be a config error - basic usage should just work...


> So what am I missing?

I don't know, we need more information. BTW, dev@ list may be more 
appropriate for this discussion.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to