What do you now set fetcher.threads.per.host to? Can you tell me what your
generate.max.per.host value is as well?

I got big improvements after setting:

<property>
 <name>fetcher.server.delay</name>
 <value>0.5</value>
 <description>The number of seconds the fetcher will delay between
  successive requests to the same server.</description>
</property>

even though I'm only generating 5 urls per host (generate.max.per.host=5). I
don't know whether fetcher.server.delay also affects requests made through a
proxy (anyone?) since I'm using a proxy.

Also, I still can't see any logging output from the fetchers i.e. what url
is being requested in any log file anywhere. I'm not so hot with java but
can anyone here tell whether:

log4j.threshhold=ALL

is conf/log4j.properties should be threshhold with 1 "h" or are 2 "h"'s the
java way?

And is there any reason why the lines in the function below are commented
out:

 public void configure(JobConf job) {
   setConf(job);

   this.segmentName = job.get(SEGMENT_NAME_KEY);
   this.storingContent = isStoringContent(job);
   this.parsing = isParsing(job);

//    if (job.getBoolean("fetcher.verbose", false)) {
//      LOG.setLevel(Level.FINE);
//    }
 }

Is this parameter now read somewhere else?

Any enlightenment always appreciated.

-Ed

On 8/9/06, Uroš Gruber <[EMAIL PROTECTED]> wrote:

Sami Siren wrote:
>
>> I set DEBUG level loging and I've checked time during operations and
>> when doint MapReduce job which is run after every page it takes 3-4
>> seconds till next url is fethed.
>> I have some local site and fetching 100 pages takes about 6 minutes.
>
> You are fetching a single site yes? Then you can get more performance
> by tweaking the configuration
> of fetcher.
>
> <property>
>  <name>fetcher.server.delay</name>
>  <value></value>
>  <description>The number of seconds the fetcher will delay between
>   successive requests to the same server.</description>
> </property>
>
> <property>
>  <name>fetcher.threads.per.host</name>
>  <value></value>
>  <description>This number is the maximum number of threads that
>    should be allowed to access a host at one time.</description>
> </property>
>
Hi,

I've manage to test nutch speed on several machines with different OS as
well.
I looks that fetcher.threads.per.host makes fetcher run faster.

What I still don't understand is this.

When fetcher threads was set to default value fetcher was doing
mapreduce after every url.
But now job is run on about 400 urls or maybe more.

--
Uros
> --
> Sami Siren


Reply via email to