Hi Jack,
for me this looks like a problem how the resolver-libary under linux resolves your hostnames. how is your network configured? can you try to use the fully quallified domainname of your server instead of just "prodfl04" (means something like "prodfl04.THIS.IS-THE-DOMAIN.COM"). this should work. If not: Perhaps you can try this within a command-shell (using "# ping prodfl04")
regards
Michael
Jack Tang schrieb:
Hi All
When I migrate nutch from windows to linux, some errors come. See the log below: ------------------------------------------------------------------------------------- Apr 29, 2005 3:30:00 PM org.apache.nutch.web.CrawlJobAdapter execute INFO: Job:CrawlJobs.CrawlJob executing @[Fri Apr 29 15:30:00 CST 2005] 050429 153000 %nutch: -local /opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt -dir /opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000 -depth 10 -showThreadID 050429 153000 parsing file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-default.xml 050429 153000 parsing file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/crawl-tool.xml 050429 153000 parsing file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-site.xml 050429 153000 crawl started in: /opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000 050429 153000 rootUrlFile = /opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt 050429 153000 threads = 10 050429 153000 depth = 10 050429 153000 Exceptions in crawl process: java.net.UnknownHostException: prodfl04: prodfl04 java.lang.RuntimeException: java.net.UnknownHostException: prodfl04: prodfl04 at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:67) at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:88) at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507) at org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438) at org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:133) at org.apache.nutch.web.CrawlJobAdapter.execute(CrawlJobAdapter.java:66) at org.quartz.core.JobRunShell.run(JobRunShell.java:191) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) Caused by: java.net.UnknownHostException: prodfl04: prodfl04 at java.net.InetAddress.getLocalHost(InetAddress.java:1191) at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:64) ... 8 more ------------------------------------------------------------ my linux box hostname is prodfl04. And the code throws exception is here( SequenceFile.java )
----------------------------------------------------------- private final byte[] sync; // 16 random bytes { try { // use hash of uid + host MessageDigest digester = MessageDigest.getInstance("MD5"); digester.update((new UID()+"@"+InetAddress.getLocalHost()).getBytes()); sync = digester.digest(); } catch (Exception e) { throw new RuntimeException(e); } } -------------------------------------------------------------- Can someone explain why? even I run the application using root, exception again and again. What should I care in linux box when deploying nutch?
Regards /Jack
-- Michael Nebel Internet: http://www.netluchs.de/
