Michael Nebel wrote:
Hi Jack,

I simply don't know, if there's a better alterative to "InetAddress.getLocalHost()". But since nutch is optimized for distribution and scaling, you can crawl from different servers (beside: this works great :-). So I would expect the hostname to be some part of any unique id.

Let me pipe in with a useful tip: if you use Nutch for Internet crawling, I would highly recommend you to install on the same segment a dedicated caching DNS server (e.g. the one from djbdns package), and configure all Nutch machines to use that server, especially the crawlers.


In my experience this saves a lot of outside traffic (only the first lookup results in external traffic), speeds up the crawling (the caching DNS is much quicker and bandwidth-conscious than normal DNS servers), and prevents some timeouts (when the connection is too clogged, or remote DNS server just collapsed under the load :-) ).

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to