Hello,
Few years ago I noticed some performance bottlenecks of Nutch; checking source code now... the same... 1. RegexURLNormalizer and similar plugins It's singleton, and main method is synchronized. Would be better to have per-thread instance, non-synchronized; but how to make it plugin then? 2. "Allow Redirects" for HttpClient By allowing redirects we can avoid HttpSession related tokens in final URLs (may be it's not acceptable for general crawl, but would be nice to have such configuration option) Fuad Efendi ================================== http://www.linkedin.com/in/liferay http://www.tokenizer.org http://www.casaGURU.com ==================================