You probably want this. https://issues.apache.org/jira/browse/HDFS-457
Koji > As we can see here the data node shutdown. Should one disk entering a > Read Only state really shut down the entire datanode? The datanode > happily restarts once the disk was unmounted. just wondering? On 8/7/09 12:11 PM, "Edward Capriolo" <edlinuxg...@gmail.com> wrote: > I have a hadoop 18.3 cluster. Today I got two nagios alerts. Actually, > I was excited by one of them because I never had a way to test > check_hpacucli. Worked first time NIIIIIIICCEEEEEEEE. (Borat voice.) > ------- > Notification Type: PROBLEM > > Service: check_remote_datanode > Host: nyhadoopdata10.ops.jointhegrid.com > Address: 10.12.9.20 > State: CRITICAL > > Date/Time: Fri Aug 7 18:24:58 GMT 2009 > > Additional Info: > > Connection refused > > ------------------------------------------------------------------------------ > ---------------------------------- > > Notification Type: PROBLEM > > Service: check_hpacucli > Host: nyhadoopdata10.ops.jointhegrid.com > Address: 10.12.9.20 > State: CRITICAL > > Date/Time: Fri Aug 7 18:23:18 GMT 2009 > > Additional Info: > > CRITICAL Smart Array P400 in Unknown Slot OK/OK/- (LD 1: OK [(1I:1:1 > OK)] LD 2: OK [(1I:1:2 OK)] LD 3: OK [(1I:1:3 OK)] LD 4: OK [(1I:1:4 > OK)] LD 5: OK [(2I:1:5 OK)] LD 6: OK [(2I:1:6 OK)] LD 7: Failed > [(2I:1:7 Failed)] LD 8: OK [(2I:1:8 OK)]) > > /usr/sbin/hpacucli ctrl all show config detail > > physicaldrive 2I:1:7 > Port: 2I > Box: 1 > Bay: 7 > Status: Failed > Drive Type: Data Drive > Interface Type: SAS > Size: 1TB > Rotational Speed: 7200 > Firmware Revision: HPD1 > Serial Number: XXXXXXXXXXXXXXXXXXXXXXXX > Model: HP DB1000BABFF > PHY Count: 2 > PHY Transfer Rate: 3.0GBPS, Unknown > > The drive was labeled as failed. It was in a READ ONLY state and > sections of the drive would produce an IO error while attempting to > read. > > > The datanode logged. > 2009-08-07 14:20:03,153 WARN org.apache.hadoop.dfs.DataNode: DataNode > is shutting down. > directory is not writable: /mnt/disk6/dfs/data/current > 2009-08-07 14:20:03,244 INFO org.apache.hadoop.dfs.DataBlockScanner: > Exiting DataBlockScanner thread. > 2009-08-07 14:20:03,430 INFO org.apache.hadoop.dfs.DataNode: > writeBlock blk_-2520177395705282298_448274 received exception > java.io.IOException: Read-only file system > 2009-08-07 14:20:03,430 ERROR org.apache.hadoop.dfs.DataNode: > DatanodeRegistration(10.12.9.20:50010, > storageID=DS-520493036-10.12.9.20-50010-1239749144772, infoPort=50075, > ipcPort=50020):DataXceiver: > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:883) > at > org.apache.hadoop.dfs.FSDataset$FSVolume.createTmpFile(FSDataset.java:388) > at > org.apache.hadoop.dfs.FSDataset$FSVolume.createTmpFile(FSDataset.java:359) > at org.apache.hadoop.dfs.FSDataset.createTmpFile(FSDataset.java:1050) > at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:983) > at > org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2382) > at > org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1234) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092) > at java.lang.Thread.run(Thread.java:619) > > 2009-08-07 14:20:32,974 INFO org.apache.hadoop.dfs.DataNode: > DatanodeRegistration(10.12.9.20:50010, > storageID=DS-520493036-10.12.9.20-50010-1239749144772, infoPort=50075, > ipcPort=50020):Finishing Dat > aNode in: > FSDataset{dirpath='/mnt/disk0/dfs/data/current,/mnt/disk1/dfs/data/current,/mn > t/disk2/dfs/data/current,/mnt/disk3/dfs/data/current,/mnt/disk4/dfs/data/curre > nt,/mnt/disk5/dfs/data/current,/m > nt/disk6/dfs/data/current,/mnt/disk7/dfs/data/current'} > 2009-08-07 14:20:32,974 INFO org.mortbay.util.ThreadedServer: Stopping > Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=50075] > 2009-08-07 14:20:32,977 INFO org.mortbay.http.SocketListener: Stopped > SocketListener on 0.0.0.0:50075 > 2009-08-07 14:20:33,155 INFO org.mortbay.util.Container: Stopped > HttpContext[/static,/static] > 2009-08-07 14:20:33,293 INFO org.mortbay.util.Container: Stopped > HttpContext[/logs,/logs] > 2009-08-07 14:20:33,293 INFO org.mortbay.util.Container: Stopped > org.mortbay.jetty.servlet.webapplicationhand...@7444f787 > 2009-08-07 14:20:33,432 INFO org.mortbay.util.Container: Stopped > WebApplicationContext[/,/] > 2009-08-07 14:20:33,432 INFO org.mortbay.util.Container: Stopped > org.mortbay.jetty.ser...@b035079 > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: Stopping > server on 50020 > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: Stopping > IPC Server Responder > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.dfs.DataNode: Waiting > for threadgroup to exit, active threads is 0 > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 50020: exiting > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: Stopping > IPC Server listener on 50020 > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 1 on 50020: exiting > 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 50020: exiting > 2009-08-07 14:20:33,434 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down DataNode at nyhadoopdata10/10.12.9.20 > ************************************************************/ > > As we can see here the data node shutdown. Should one disk entering a > Read Only state really shut down the entire datanode? The datanode > happily restarts once the disk was unmounted. just wondering?