Hi Jack,
I simply don't know, if there's a better alterative to "InetAddress.getLocalHost()". But since nutch is optimized for distribution and scaling, you can crawl from different servers (beside: this works great :-). So I would expect the hostname to be some part of any unique id.
Regards
Michael
Jack Tang schrieb:
Hi Michael
I sloved it.
Still the question: is InetAddress.getLocalHost() is the best choice in digester.update((new UID()+"@"+InetAddress.getLocalHost()).getBytes());
??
/Jack
On 4/29/05, Michael Nebel <[EMAIL PROTECTED]> wrote:
Hi Jack,
for me this looks like a problem how the resolver-libary under linux resolves your hostnames. how is your network configured? can you try to use the fully quallified domainname of your server instead of just "prodfl04" (means something like "prodfl04.THIS.IS-THE-DOMAIN.COM"). this should work. If not: Perhaps you can try this within a command-shell (using "# ping prodfl04")
regards
Michael
Jack Tang schrieb:
Hi All
When I migrate nutch from windows to linux, some errors come. See the log below: ------------------------------------------------------------------------------------- Apr 29, 2005 3:30:00 PM org.apache.nutch.web.CrawlJobAdapter execute INFO: Job:CrawlJobs.CrawlJob executing @[Fri Apr 29 15:30:00 CST 2005] 050429 153000 %nutch: -local /opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt -dir /opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000 -depth 10 -showThreadID 050429 153000 parsing file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-default.xml 050429 153000 parsing file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/crawl-tool.xml 050429 153000 parsing file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-site.xml 050429 153000 crawl started in: /opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000 050429 153000 rootUrlFile = /opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt 050429 153000 threads = 10 050429 153000 depth = 10 050429 153000 Exceptions in crawl process: java.net.UnknownHostException: prodfl04: prodfl04 java.lang.RuntimeException: java.net.UnknownHostException: prodfl04: prodfl04 at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:67) at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:88) at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507) at org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438) at org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:133) at org.apache.nutch.web.CrawlJobAdapter.execute(CrawlJobAdapter.java:66) at org.quartz.core.JobRunShell.run(JobRunShell.java:191) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) Caused by: java.net.UnknownHostException: prodfl04: prodfl04 at java.net.InetAddress.getLocalHost(InetAddress.java:1191) at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:64) ... 8 more ------------------------------------------------------------ my linux box hostname is prodfl04. And the code throws exception is here( SequenceFile.java )
----------------------------------------------------------- private final byte[] sync; // 16 random bytes { try { // use hash of uid + host MessageDigest digester = MessageDigest.getInstance("MD5"); digester.update((new UID()+"@"+InetAddress.getLocalHost()).getBytes()); sync = digester.digest(); } catch (Exception e) { throw new RuntimeException(e); } } -------------------------------------------------------------- Can someone explain why? even I run the application using root, exception again and again. What should I care in linux box when deploying nutch?
Regards /Jack
--- Michael Nebel Internet: http://www.netluchs.de/
