Hi Jack,

for me this looks like a problem how the resolver-libary under linux resolves your hostnames. how is your network configured? can you try to use the fully quallified domainname of your server instead of just "prodfl04" (means something like "prodfl04.THIS.IS-THE-DOMAIN.COM"). this should work. If not: Perhaps you can try this within a command-shell (using "# ping prodfl04")

regards

        Michael



Jack Tang schrieb:

Hi All

When I migrate nutch from windows to linux, some errors come.
See the log below:
-------------------------------------------------------------------------------------
Apr 29, 2005 3:30:00 PM org.apache.nutch.web.CrawlJobAdapter execute
INFO: Job:CrawlJobs.CrawlJob executing @[Fri Apr 29 15:30:00 CST 2005]
050429 153000 %nutch: -local
/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt -dir
/opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000
-depth 10 -showThreadID
050429 153000 parsing
file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-default.xml
050429 153000 parsing
file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/crawl-tool.xml
050429 153000 parsing
file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-site.xml
050429 153000 crawl started in:
/opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000
050429 153000 rootUrlFile =
/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt
050429 153000 threads = 10
050429 153000 depth = 10
050429 153000 Exceptions in crawl process:
java.net.UnknownHostException: prodfl04: prodfl04
java.lang.RuntimeException: java.net.UnknownHostException: prodfl04: prodfl04
        at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:67)
        at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:88)
        at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
        at org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
        at org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:133)
        at org.apache.nutch.web.CrawlJobAdapter.execute(CrawlJobAdapter.java:66)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
        at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)
Caused by: java.net.UnknownHostException: prodfl04: prodfl04
        at java.net.InetAddress.getLocalHost(InetAddress.java:1191)
        at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:64)
        ... 8 more
------------------------------------------------------------
my linux box hostname is prodfl04.
And the code throws exception is here( SequenceFile.java )

-----------------------------------------------------------
    private final byte[] sync;                    // 16 random bytes
    {
      try {                                       // use hash of uid + host
        MessageDigest digester = MessageDigest.getInstance("MD5");
        digester.update((new UID()+"@"+InetAddress.getLocalHost()).getBytes());
        sync = digester.digest();
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
    }
--------------------------------------------------------------
Can someone explain why? even I run the application using root,
exception again and again. What should I care in linux box when
deploying nutch?

Regards
/Jack


--
Michael Nebel
Internet: http://www.netluchs.de/



Reply via email to