Your Datanode is overloaded, try to profile it and check the heapsize of your namenode and your datanodes.
2012/10/16 Yuesheng Hu <[email protected]> > Hi, Thomas > > When I test K-mean with cache enabled, the Filesystem closed exception > raised when the input size became to about 6GB, our cluster is: > 10 node (1 master, 9 slaves), 5 tasks/node, 1000MB RAM per task, I > think the cluster is power enough to handle this input size. > but it failed, the log is : > 12/10/11 10:05:17 INFO bsp.FileInputFormat: Total input paths to process : > 45 > 12/10/11 10:05:18 INFO bsp.BSPJobClient: Running job: job_201210111001_0003 > 12/10/11 10:05:21 INFO bsp.BSPJobClient: Current supersteps number: 0 > 12/10/11 12:01:47 INFO bsp.BSPJobClient: Current supersteps number: 1 > 12/10/11 13:48:33 INFO bsp.BSPJobClient: Current supersteps number: 2 > 12/10/11 15:26:48 INFO bsp.BSPJobClient: Current supersteps number: 3 > 12/10/11 17:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4 > 12/10/11 18:45:12 INFO bsp.BSPJobClient: Current supersteps number: 5 > attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO bsp.BSPPeerImpl: > Moving to local cache files: INITIALLY IT WAS: null > attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO > sync.ZKSyncClient: Initializing ZK Sync Client > attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO > sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At datanode09/ > 192.168.1.219:61001 > attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 ERROR > sync.ZooKeeperSyncClientImpl: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /bsp/job_201210111001_0003/peers > attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: > Starting SocketReader > attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC > Server Responder: starting > attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO > message.HadoopMessageManagerImpl: BSPPeer address:datanode09 port:61001 > attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC > Server listener on 61001: starting > attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC > Server handler 0 on 61001: starting > attempt_201210111001_0003_000004_0: 12/10/11 18:45:47 INFO ml.KMeansBSP: > Finished! Writing the assignments... > attempt_201210111001_0003_000004_0: 12/10/11 18:46:29 ERROR bsp.BSPTask: > Error running bsp setup and bsp function. > attempt_201210111001_0003_000004_0: java.io.IOException: Filesystem closed > attempt_201210111001_0003_000004_0: at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) > attempt_201210111001_0003_000004_0: at > org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) > attempt_201210111001_0003_000004_0: at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213) > attempt_201210111001_0003_000004_0: at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2152) > attempt_201210111001_0003_000004_0: at > java.io.DataInputStream.readInt(DataInputStream.java:370) > attempt_201210111001_0003_000004_0: at > > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1953) > attempt_201210111001_0003_000004_0: at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1983) > attempt_201210111001_0003_000004_0: at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2120) > attempt_201210111001_0003_000004_0: at > > org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:85) > attempt_201210111001_0003_000004_0: at > > org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:63) > attempt_201210111001_0003_000004_0: at > org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:49) > attempt_201210111001_0003_000004_0: at > org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:630) > attempt_201210111001_0003_000004_0: at > > org.apache.hama.ml.KMeansBSP.recalculateAssignmentsAndWrite(KMeansBSP.java:269) > attempt_201210111001_0003_000004_0: at > org.apache.hama.ml.KMeansBSP.bsp(KMeansBSP.java:142) > attempt_201210111001_0003_000004_0: at > org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166) > attempt_201210111001_0003_000004_0: at > org.apache.hama.bsp.BSPTask.run(BSPTask.java:143) > attempt_201210111001_0003_000004_0: at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271) > 12/10/11 18:45:54 INFO bsp.BSPJobClient: Job failed. > > What happened? >
