1. Nutch follows the links within HTML web pages to crawl the full graph of a 
web of pages.

In addition, I think Nutch has PageRank-like scoring function as opposed to
Lucene/Solr, those are based on vector space model scoring.

koji
--
http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html

Reply via email to