Hi, I am trying to run the nutch crawler for the first time and I am getting an exception but can't find out the cause behind it. The following are the details. The urls file contains: http://facweb.iitkgp.ernet.in/ The conf/crawl-urlfilter.txt contains: +^http://([a-z0-9]*\.)*iitkgp.ernet.in/ The command I specified was: bin/nutch crawl urls -dir crawl.my -depth 10 And the part of the log along with the exception is: 060321 164802 logging at INFO 060321 164802 fetching http://facweb.iitkgp.ernet.in/ 060321 164802 http.proxy.host = 10.5.17.147 060321 164802 http.proxy.port = 8080 060321 164802 http.timeout = 100000 060321 164802 http.content.limit = 65536 060321 164802 http.agent = NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org) 060321 164802 fetcher.server.delay = 1000 060321 164802 http.max.delays = 100 060321 164802 fetching http://facweb.iitkgp.ernet.in/robots.txt 060321 164802 fetched 1060 bytes from http://facweb.iitkgp.ernet.in/robots.txt 060321 164812 fetch of http://facweb.iitkgp.ernet.in/ failed with: java.lang.Exception: org.apache.nutch.protocol.http.HttpException: java.net.UnknownHostException: facweb.iitkgp.ernet.in: facweb.iitkgp.ernet.in 060321 164813 status: segment 20060321164801, 0 pages, 1 errors, 0 bytes, 11175 ms 060321 164813 status: 0.0 pages/s, 0.0 kb/s, NaN bytes/page 060321 164814 Updating /home/anindyac/crawl/nutch-0.7.1/crawl.my/db 060321 164814 Updating for /home/anindyac/crawl/nutch-0.7.1/crawl.my/segments/20060321164801 060321 164814 Processing document 0 As can be seen, the fetching is failing. I also enquired the proxy (at 10.5.17.147:8080) log. It contains [2006-03-21 16:49:14] 10.5.17.146 unknown Web GET http://lucene.apache.org/robots.txt 404 Not Found [2006-03-21 16:51:23] 10.5.17.146 unknown Web GET http://facweb.iitkgp.ernet.in/robots.txt [2006-03-21 16:51:23] 10.5.17.146 unknown Web GET http://facweb.iitkgp.ernet.in/robots.txt Object not found! Does that mean that a site can't be crawled with NUTCH if it does not have a robots.txt file. Note that the same thing happened when I tried to crawl the NUTCH website. Please tell me what to do in-order to get NUTCH going. Thanks and Regards, Anindya