Hi Andrzej, Yes, I measured/compared (two years ago), I am actually using simplified rewritten code based on Nutch, with non-synchronized instance per thread.
Imagine 1024 threads, each having 100 Outlinks and trying to call synchronized method... total 102,400 concurrent calls to synchronized method (during, in average (network delays), 3-seconds frame)... I was even able to have 1024 concurrent threads without any performance impact! Also, each synchronization requires additional CPU cycles (500-1000) even when concurrency is small. With non-synchronized, I can't have more than 128 threads - CPU overloaded. It run faster. -Fuad > -----Original Message----- > From: Andrzej Bialecki [mailto:a...@getopt.org] > Sent: October-19-09 5:47 AM > To: nutch-dev@lucene.apache.org > Subject: Re: Niocchi - java asynchronous crawl library released > > Fuad Efendi wrote: > > Hi Andrzej, > > > > Real bottleneck of Nutch is RegexURLNormalizer, it is still synchronized > singleton (shared by multiple threads). And similar synchronized plugins which > should be probably refactored to Nutch core... > > It's not a singleton, but it's true that the normalize() method is > synchronized. Did you actually measure the impact of this > synchronization on the crawling speed? I very much doubt it outweighs > the impact of politeness limits. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com