Jack Tang wrote:
Hi All

Is nutch crawler breadth-first one? It seems a lot of URLs are lost
while I try do breadth-first crawling, I set the depth to 3.
Any comments?

Yes, and yes - there is a possiblity that some urls are lost, if they require maintaining a single session. If you encounter such sites, a depth-first crawler would be better.

It's not too difficult to build one, using the tools already present in Nutch. Contributions are welcome... ;-)

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to