Hi Andrzej,

Yes, I measured/compared (two years ago), I am actually using simplified 
rewritten code based on Nutch, with non-synchronized instance per thread.

Imagine 1024 threads, each having 100 Outlinks and trying to call synchronized 
method... total 102,400 concurrent calls to synchronized method (during, in 
average (network delays), 3-seconds frame)... I was even able to have 1024 
concurrent threads without any performance impact! Also, each synchronization 
requires additional CPU cycles (500-1000) even when concurrency is small.

With non-synchronized, I can't have more than 128 threads - CPU overloaded. It 
run faster.
-Fuad


> -----Original Message-----
> From: Andrzej Bialecki [mailto:a...@getopt.org]
> Sent: October-19-09 5:47 AM
> To: nutch-dev@lucene.apache.org
> Subject: Re: Niocchi - java asynchronous crawl library released
> 
> Fuad Efendi wrote:
> > Hi Andrzej,
> >
> > Real bottleneck of Nutch is RegexURLNormalizer, it is still synchronized
> singleton (shared by multiple threads). And similar synchronized plugins which
> should be probably refactored to Nutch core...
> 
> It's not a singleton, but it's true that the normalize() method is
> synchronized. Did you actually measure the impact of this
> synchronization on the crawling speed? I very much doubt it outweighs
> the impact of politeness limits.
> 
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com



Reply via email to