[ https://issues.apache.org/jira/browse/HBASE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776855#action_12776855 ]
stack commented on HBASE-1973: ------------------------------ Weird thing is that it seems like the create eventually succeeds.. after we've given up on it if I look at the NN: {code} [st...@aa0-000-12 logs]$ grep '/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617' hadoop-stack-namenode-aa0-000-12.u.powerset.com.log 2009-11-12 05:09:38,618 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.create: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 at 208.76.44.141 2009-11-12 05:09:38,618 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617, holder=DFSClient_276778250, clientMachine=208.76.44.141, createParent=true, replication=3, overwrite=true, append=false 2009-11-12 05:09:38,619 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 is added to the file system 2009-11-12 05:09:38,619 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: add /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 to namespace for DFSClient_276778250 2009-11-12 05:09:38,620 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=stack,powerset,engineering ip=/208.76.44.141 cmd=create src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 dst=null perm=stack:supergroup:rw-r--r-- 2009-11-12 05:09:38,630 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:38,630 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:38,631 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_1786802275426334496_1076 block is added to the in-memory file system 2009-11-12 05:09:38,631 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617. blk_1786802275426334496_1076{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW], ReplicaUnderConstruction[208.76.44.139:51010|RBW], ReplicaUnderConstruction[208.76.44.140:51010|RBW]]} 2009-11-12 05:09:38,646 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.abandonBlock: blk_1786802275426334496_1076 of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:38,646 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.abandonBlock: blk_1786802275426334496_1076of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:38,646 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_1786802275426334496_1076 block is added to the file system 2009-11-12 05:09:44,655 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:44,655 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:44,656 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_5838454320666652488_1076 block is added to the in-memory file system 2009-11-12 05:09:44,656 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617. blk_5838454320666652488_1076{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW], ReplicaUnderConstruction[208.76.44.140:51010|RBW], ReplicaUnderConstruction[208.76.44.139:51010|RBW]]} 2009-11-12 05:09:44,662 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.abandonBlock: blk_5838454320666652488_1076 of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:44,662 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.abandonBlock: blk_5838454320666652488_1076of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:44,662 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_5838454320666652488_1076 block is added to the file system 2009-11-12 05:09:50,675 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:50,675 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:50,676 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_8615264034947520229_1076 block is added to the in-memory file system 2009-11-12 05:09:50,676 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617. blk_8615264034947520229_1076{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW], ReplicaUnderConstruction[208.76.44.139:51010|RBW], ReplicaUnderConstruction[208.76.44.140:51010|RBW]]} 2009-11-12 05:09:50,683 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.abandonBlock: blk_8615264034947520229_1076 of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:50,683 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.abandonBlock: blk_8615264034947520229_1076of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:50,683 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_8615264034947520229_1076 block is added to the file system 2009-11-12 05:09:56,685 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:56,685 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock: file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 for DFSClient_276778250 2009-11-12 05:09:56,686 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_-5632723632070749366_1077 block is added to the in-memory file system 2009-11-12 05:09:56,686 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617. blk_-5632723632070749366_1077{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW], ReplicaUnderConstruction[208.76.44.140:51010|RBW], ReplicaUnderConstruction[208.76.44.139:51010|RBW]]} 2009-11-12 05:09:56,691 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.abandonBlock: blk_-5632723632070749366_1077 of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:56,691 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.abandonBlock: blk_-5632723632070749366_1077of file /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:09:56,691 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.addFile: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 with blk_-5632723632070749366_1077 block is added to the file system Its failing above opening the file. Below is open by the master doing its split.... 2009-11-12 05:10:10,924 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=stack,powerset,engineering ip=/208.76.44.139 cmd=open src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 dst=null perm=null 2009-11-12 05:10:10,925 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* Namenode.delete: src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617, recursive=true 2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.delete: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.delete: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 is removed 2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.server.namenode.LeaseManager: LeaseManager.findLease: prefix=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.server.namenode.LeaseManager: LeaseManager.removeLeaseWithPrefixPath: entry=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617=[Lease. Holder: DFSClient_276778250, pendingcreates: 1] 2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* Namesystem.delete: /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 is removed 2009-11-12 05:10:10,927 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=stack,powerset,engineering ip=/208.76.44.139 cmd=delete src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617 dst=null perm=null {code} The above open on master actually throws an EOFE. File length is seen as zero. We don't do the available on the backing stream to get the length hdfs sees. > Second RS goes down because "Bad connect ack with firstBadLink as..." > --------------------------------------------------------------------- > > Key: HBASE-1973 > URL: https://issues.apache.org/jira/browse/HBASE-1973 > Project: Hadoop HBase > Issue Type: Bug > Reporter: stack > > Placeholder for issue where a second RS goes down when I kill another RS and > its DN. I asked over on hdfs-user why we're stuck on 140. Why can't it move > on to another DN creating a block. > {code} > 2009-11-12 05:09:38,624 [regionserver/208.76.44.141:60020.logRoller] INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll > /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002405346, > entries=52747, calcsize=63757973, filesize=58588977. New hlog > /hbase/.logs/aa0-000-14.u.powerset.com,60020,12 > 58002404935/hlog.dat.1258002578617 > 2009-11-12 05:09:38,628 [regionserver/208.76.44.141:60020.logRoller] DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLog: Found 0 hlogs to remove out > of total 1; oldest outstanding seqnum is 1 from region .META.,,1 > 2009-11-12 05:09:38,628 [IPC Server handler 16 on 60020] INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: edit=0, > write=TestTable/TestTable,,1258002466308/612228 > 2009-11-12 05:09:38,630 [regionserver/208.76.44.141:60020.logSyncer] INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: sync > 2009-11-12 05:09:38,646 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream java.io.IOException: Bad connect ack > with firstBadLink as 208.76.44.140:51010 > 2009-11-12 05:09:38,646 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_1786802275426334496_1076 > 2009-11-12 05:09:44,662 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream java.io.IOException: Bad connect ack > with firstBadLink as 208.76.44.140:51010 > 2009-11-12 05:09:44,662 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_5838454320666652488_1076 > 2009-11-12 05:09:46,226 [pool-1-thread-1] DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=5.5656967MB (5836056), Free=672.77185MB (705452392), Max=678.3375MB > (711288448), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0, > Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted > /Run=NaN > 2009-11-12 05:09:50,683 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream java.io.IOException: Bad connect ack > with firstBadLink as 208.76.44.140:51010 > 2009-11-12 05:09:50,684 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_8615264034947520229_1076 > 2009-11-12 05:09:56,691 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream java.io.IOException: Bad connect ack > with firstBadLink as 208.76.44.140:51010 > 2009-11-12 05:09:56,692 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_-5632723632070749366_1077 > 2009-11-12 05:10:02,696 [Thread-61] WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception: java.io.IOException: Unable to create new block. > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:3100) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2681) > 2009-11-12 05:10:02,697 [Thread-61] WARN org.apache.hadoop.hdfs.DFSClient: > Could not get block locations. Source file > "/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617" > - Aborting... > 2009-11-12 05:10:02,699 [regionserver/208.76.44.141:60020.logSyncer] FATAL > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting > close of hlog > java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.createBlockOutputStream(DFSClient.java:3160) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:3080) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2681) > 2009-11-12 05:10:02,701 [regionserver/208.76.44.141:60020.logSyncer] ERROR > org.apache.hadoop.hbase.regionserver.wal.HLog: Error while syncing, > requesting close of hlog > java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.createBlockOutputStream(DFSClient.java:3160) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:3080) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2681) > 2009-11-12 05:10:02,703 [IPC Server handler 20 on 60020] FATAL > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting > close of hlog > java.io.IOException: IOException flush:java.io.IOException: Bad connect ack > with firstBadLink as 208.76.44.140:51010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.hflush(DFSClient.java:3527) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3473) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:829) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:751) > 2009-11-12 05:10:02,703 [regionserver/208.76.44.141:60020.logSyncer] INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: > regionserver/208.76.44.141:60020.logSyncer exiting > 2009-11-12 05:10:02,706 [IPC Server handler 10 on 60020] FATAL > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting > close of hlog > java.io.IOException: IOException flush:java.io.IOException: Bad connect ack > with firstBadLink as 208.76.44.140:51010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.hflush(DFSClient.java:3527) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3473) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:829) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:751) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.