Hello, I have been looking at Hadoop for awhile now and have been trying to get 0.4.0 to work with Nutch to do a small distributed crawl. Problem is, whenever the task (fetching) nears completion, the job fails.
I am running 2 datanodes and 5 tasktracker nodes. One of the tasktracker nodes has lots of entries in the log such as: 2006-07-25 00:02:56,690 INFO mapred.TaskRunner (ReduceTaskRunner.java:copyOutput(240)) - task_0001_r_000024_2 done copying task_0001_m_000001_0 output from fox10.nameprotect.com. 2006-07-25 00:02:59,656 WARN mapred.TaskRunner (ReduceTaskRunner.java:copyOutput(246)) - task_0001_r_000038_2 failed to copy task_0001_m_000001_0 output from fox10.nameprotect.com. 2006-07-25 00:02:59,657 WARN mapred.TaskRunner (ReduceTaskRunner.java:run(210)) - task_0001_r_000038_2 copy failed: task_0001_m_000001_0 from fox10.nameprotect.com 2006-07-25 00:02:59,657 WARN mapred.TaskRunner (ReduceTaskRunner.java:run(212)) - java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.Socket.connect(Socket.java:507) at java.net.Socket.connect(Socket.java:457) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:365) at sun.net.www.http.HttpClient.openServer(HttpClient.java:477) at sun.net.www.http.HttpClient.<init>(HttpClient.java:214) at sun.net.www.http.HttpClient.New(HttpClient.java:287) at sun.net.www.http.HttpClient.New(HttpClient.java:299) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConn ection.java:784) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnecti on.java:736) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.ja va:661) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnec tion.java:905) at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.jav a:108) at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(Red uceTaskRunner.java:237) at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTask Runner.java:207) So, looking at fox10, I see this in the log: 2006-07-25 00:03:15,821 WARN mapred.TaskTracker - Unknown child with bad map output: task_0001_m_000001_0. Ignored. 2006-07-25 00:03:15,990 WARN mapred.TaskTracker - Http server (getMapOutput.jsp): java.io.FileNotFoundException: /index2/nutch/filesystem/mapreduce/local/task_0001_m_000001_0/ part-98.out at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:121) at org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream. java:47) at org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:229 ) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:158) at org.apache.hadoop.mapred.getMapOutput_jsp._jspService(getMapOutput_jsp.j ava:64) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH andler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon text.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244 ) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 2006-07-25 00:03:15,990 WARN mapred.TaskTracker - Unknown child with bad map output: task_0001_m_000001_0. Ignored. 2006-07-25 00:03:16,486 WARN mapred.TaskRunner - task_0001_r_000080_2 failed to copy task_0001_m_000001_0 output from fox10.nameprotect.com. 2006-07-25 00:03:16,486 WARN mapred.TaskRunner - task_0001_r_000080_2 copy failed: task_0001_m_000001_0 from fox10.nameprotect.com 2006-07-25 00:03:16,489 WARN mapred.TaskRunner - java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.Socket.connect(Socket.java:507) at java.net.Socket.connect(Socket.java:457) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:365) at sun.net.www.http.HttpClient.openServer(HttpClient.java:477) at sun.net.www.http.HttpClient.<init>(HttpClient.java:214) at sun.net.www.http.HttpClient.New(HttpClient.java:287) at sun.net.www.http.HttpClient.New(HttpClient.java:299) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConn ection.java:784) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnecti on.java:736) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.ja va:661) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnec tion.java:905) at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.jav a:108) at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(Red uceTaskRunner.java:237) at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTask Runner.java:207) Funny thing is, the tasktrackers are all still up and running. Datanodes seem fine, and so does the jobtracker. A small insert into the crawldb works, as well as the generate job that made the segments I'm trying to fetch. But something is happening with the fetching job. Does anyone have any ideas what could be wrong? Hadoop-site.xml for your reference: <property> <name>mapred.map.tasks</name> <value>25</value> <description> define mapred.map tasks to be number of slave hosts </description> </property> <property> <name>mapred.reduce.tasks</name> <value>25</value> <description> define mapred.reduce tasks to be number of slave hosts </description> </property> <property> <name>mapred.tasktracker.tasks.maximum</name> <value>5</value> <description>The maximum number of tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>dfs.name.dir</name> <value>/usr/local/nutch/name</value> </property> <property> <name>dfs.data.dir</name> <value>/index2/nutch/filesystem/data</value> </property> <property> <name>mapred.system.dir</name> <value>/index2/nutch/filesystem/mapreduce/system</value> </property> <property> <name>mapred.local.dir</name> <value>/index2/nutch/filesystem/mapreduce/local</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512m</value> <description>Java opts for the task tracker child processes. Subsumes 'mapred.child.heap.size' (If a mapred.child.heap.size value is found in a configuration, its maximum heap size will be used and a warning emitted that heap.size has been deprecated). Also, the following symbols, if present, will be interpolated: @taskid@ is replaced by current TaskID; and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A second child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is greater than one). Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] </description> </property> <!-- i/o properties --> <property> <name>io.sort.factor</name> <value>100</value> <description>The number of streams to merge at once while sorting files. This determines the number of open file handles.</description> </property> <property> <name>io.sort.mb</name> <value>500</value> <description>The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.</description> </property> </configuration> Thanks for the great work on Hadoop! Greg
