Hi Jack,

I simply don't know, if there's a better alterative to "InetAddress.getLocalHost()". But since nutch is optimized for distribution and scaling, you can crawl from different servers (beside: this works great :-). So I would expect the hostname to be some part of any unique id.

Regards

        Michael



Jack Tang schrieb:
Hi Michael

I sloved it.
Still the question: is InetAddress.getLocalHost() is the best choice in digester.update((new UID()+"@"+InetAddress.getLocalHost()).getBytes());
??


/Jack

On 4/29/05, Michael Nebel <[EMAIL PROTECTED]> wrote:

Hi Jack,

for me this looks like a problem how the resolver-libary under linux
resolves your hostnames. how is your network configured? can you try to
use the fully quallified domainname of your server instead of just
"prodfl04" (means something like "prodfl04.THIS.IS-THE-DOMAIN.COM").
this should work. If not:  Perhaps you can try this within a
command-shell (using "# ping prodfl04")

regards

      Michael

Jack Tang schrieb:


Hi All

When I migrate nutch from windows to linux, some errors come.
See the log below:
-------------------------------------------------------------------------------------
Apr 29, 2005 3:30:00 PM org.apache.nutch.web.CrawlJobAdapter execute
INFO: Job:CrawlJobs.CrawlJob executing @[Fri Apr 29 15:30:00 CST 2005]
050429 153000 %nutch: -local
/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt -dir
/opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000
-depth 10 -showThreadID
050429 153000 parsing
file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-default.xml
050429 153000 parsing
file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/crawl-tool.xml
050429 153000 parsing
file:/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/nutch-site.xml
050429 153000 crawl started in:
/opt/tomcat/tomcat-nutch/tomcat-nutch-5.0.19/bin/nutch-tmp/nutchcrawl-20050429153000
050429 153000 rootUrlFile =
/opt/tomcat/tomcat-nutch/nutch-ccs/WEB-INF/classes/seed.txt
050429 153000 threads = 10
050429 153000 depth = 10
050429 153000 Exceptions in crawl process:
java.net.UnknownHostException: prodfl04: prodfl04
java.lang.RuntimeException: java.net.UnknownHostException: prodfl04: prodfl04
       at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:67)
       at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:88)
       at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
       at org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
       at org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
       at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:133)
       at org.apache.nutch.web.CrawlJobAdapter.execute(CrawlJobAdapter.java:66)
       at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
       at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)
Caused by: java.net.UnknownHostException: prodfl04: prodfl04
       at java.net.InetAddress.getLocalHost(InetAddress.java:1191)
       at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:64)
       ... 8 more
------------------------------------------------------------
my linux box hostname is prodfl04.
And the code throws exception is here( SequenceFile.java )

-----------------------------------------------------------
   private final byte[] sync;                    // 16 random bytes
   {
     try {                                       // use hash of uid + host
       MessageDigest digester = MessageDigest.getInstance("MD5");
       digester.update((new UID()+"@"+InetAddress.getLocalHost()).getBytes());
       sync = digester.digest();
     } catch (Exception e) {
       throw new RuntimeException(e);
     }
   }
--------------------------------------------------------------
Can someone explain why? even I run the application using root,
exception again and again. What should I care in linux box when
deploying nutch?

Regards
/Jack

---
Michael Nebel
Internet: http://www.netluchs.de/



Reply via email to