Hi Jack,
I simply don't know, if there's a better alterative to "InetAddress.getLocalHost()". But since nutch is optimized for distribution and scaling, you can crawl from different servers (beside: this works great :-). So I would expect the hostname to be some part of any unique id.
Let me pipe in with a useful tip: if you use Nutch for Internet crawling, I would highly recommend you to install on the same segment a dedicated caching DNS server (e.g. the one from djbdns package), and configure all Nutch machines to use that server, especially the crawlers.
In my experience this saves a lot of outside traffic (only the first lookup results in external traffic), speeds up the crawling (the caching DNS is much quicker and bandwidth-conscious than normal DNS servers), and prevents some timeouts (when the connection is too clogged, or remote DNS server just collapsed under the load :-) ).
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
