nutch vs. LCF for web crawling

Jack Krupansky Thu, 10 Jun 2010 10:41:16 -0700

It would be nice to have a brief summary comparison of the web crawling 
features of LCF relative to nutch. I personally don't know the details of nutch 
other than a quick read of the tutorial, but I am wondering whether there are 
any features of nutch web crawling that may not be available in the LCF web 
crawl connector.


A second question is whether nutch has any performance or volume advantage over 
LCF for web crawling, in a general, rough sense, although some specific 
performance tests for LCF will eventually be good to have.

I would envision people using LCF to crawl desired web sites rather than the 
whole web, but the number of desired sites to be crawled could still be a 
moderately large number. At some point we should publish some guidelines as to 
what amount of web crawling LCF is targeted to support, in a general, rough 
sense.

(Answers could go in the LCF FAQ.)

Thanks.


-- Jack Krupansky

nutch vs. LCF for web crawling

Reply via email to