Hi All, I need to say sorry about this question...Due to I found that there is a code snippet of shell script in our operation tool would delete the /<hbase-root>/.logs folder. So this is JUST a MISUNDERSTANDING.
Sorry to waste your time on this...and tks the help from Ted & Rabbit's Foot Orz Best regards takeshi 2014-02-19 16:57 GMT+08:00 takeshi <[email protected]>: > HI Rabbit's Foot, > > I think this is simply not a kerberos setting issue. Due to every time > before I run my reproduce tests, I would run the ycsb tests with > workload[a-f] to make sure the healthiness of the HBase cluster, then run > the reproduce tests later. > > Currently, I saw two type of error msgs from regionservers during this > reproducing test > > One is... > {code:title=regionserver running on aws-scottm-aws-1} > > 243 [2014-02-18 01:19:10,142][Thread-11][WARN > ][org.apache.hadoop.hdfs.DFSClient]: DataStreamer Exception > 244 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714 > 776/aws-scottm-tmh6-1.ap-southeast-1.compute.internal%2C8120%2C1392703714776.1392703718212 > File does not exist. Holder DFSClient_NONMAPREDUCE_-2013292460_25 does not > have any open files. > {code} > > Another is > {code:title=regionserver running on aws-scottm-aws-2} > 491 [2014-02-18 > 02:09:44,813][regionserver8120.logRoller][FATAL][org.apache.hadoop.hbase.regionserver.HRegionServer]: > ABORTING region server > aws-scottm-tmh6-2.ap-southeast-1.compute.internal,8120,13927037 > 17908: IOE in log roller > 492 java.io.IOException: Exception in createWriter > 493 at > org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:66) > 494 at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:694) > 495 at > org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:631) > 496 at > org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > 497 at java.lang.Thread.run(Thread.java:662) > 498 Caused by: java.io.IOException: cannot get log writer > 499 at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:750) > 500 at > org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:60) > 501 ... 4 more > 502 Caused by: java.io.IOException: java.io.FileNotFoundException: Parent > directory doesn't exist: > /user/SPN-hbase/.logs/aws-scottm-tmh6-2.ap-southeast-1.compute.internal,8120,1392703717908 > {code} > > But I think the root cause of these two type of error msgs could be the > same, which is the "/<hbase-root>/.logs" was removed accidentally at some > point of time... > > Currently, I think this issue is related to the issue which the discussion > thread was talking about > > http://mail-archives.apache.org/mod_mbox/hbase-dev/201305.mbox/%[email protected]%3E > > > > Best regards > > takeshi > > > 2014-02-19 14:31 GMT+08:00 Rabbit's Foot <[email protected]>: > > Hi >> >> according to the log message: >> >> 1343 2014-02-18 01:19:10,132 ERROR >> org.apache.hadoop.security.UserGroupInformation: >> PriviledgedActionException >> as:hbase/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB >> (auth:KERBEROS) >> cause:org.apache >> .hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on >> >> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/aws-scottm-tmh6-1.ap-southeast-1.compute.inter >> >> >> it seems to be the problem about Kerberos settings.please check the >> setting >> >> dfs.namenode.keytab.file >> dfs.namenode.kerberos.principal >> >> and check something on HBase: >> >> 1.Check if the value of "hbase.rootdir" parameter set up in the >> hbase-site.xml file is established under the HDFS. >> 2.using hbase shell . Check if the "create table" command can be executed >> correctly >> >> Best regards >> >> rabbit's foot >> >> >> 2014-02-19 13:47 GMT+08:00 takeshi <[email protected]>: >> >> > Hi, all >> > >> > I think that I may need to further describe the test env. I used for >> > reproducing this issue. >> > >> > | hostname | ip | roles | >> > | aws-scottm-tmh6-1 | 10.147.1.168 | activeNN, ZKFC, JN, ZK, activeJT, >> > MRZK, NN, TT, HM, RS| >> > | aws-scottm-tmh6-2 | 10.138.150.16 | standbyNN, ZKFC, JN, standbyJT, >> MRZK, >> > NN, TT, HM, RS| >> > | aws-scottm-tmh6-3 | 10.134.214.132 | JN, NN, TT, RS| >> > >> > P.S. I currently do not think that this issue is related to MR related >> > daemons >> > >> > I would find some other places to put the log files for people >> interested >> > in these works. >> > >> > >> > >> > Best regards >> > >> > takeshi >> > >> > >> > 2014-02-19 13:34 GMT+08:00 takeshi <[email protected]>: >> > >> > > Hi, Ted >> > > >> > > Yes, there is a regionserver running on 10.147.1.168, following are >> the >> > > log snippet for it. But actually this regionserver did nothing around >> > > 2014-02-18 01:17:31 ...and it detected error around 01:18:53 >> > > {code} >> > > 218 [2014-02-18 01:08:40,123][REPL IPC Server handler 0 on 8120][INFO >> > > ][org.apache.hadoop.ipc.HBaseServer]: REPL IPC Server handler 0 on >> 8120: >> > > starting >> > > 219 [2014-02-18 01:08:40,124][REPL IPC Server handler 1 on 8120][INFO >> > > ][org.apache.hadoop.ipc.HBaseServer]: REPL IPC Server handler 1 on >> 8120: >> > > starting >> > > 220 [2014-02-18 01:08:40,126][REPL IPC Server handler 2 on 8120][INFO >> > > ][org.apache.hadoop.ipc.HBaseServer]: REPL IPC Server handler 2 on >> 8120: >> > > starting >> > > 221 [2014-02-18 01:08:40,128][regionserver8120][INFO >> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Serving as >> > > aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776, >> RPC >> > > listening o n aws-scottm-tmh6-1.ap-southeast-1.compute.internal/ >> > > 10.147.1.168:8120, sessionid=0x14443974ce70009 >> > > 222 [2014-02-18 01:08:40,130][regionserver8120][INFO >> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Registered >> > > RegionServer MXBean >> > > 223 [2014-02-18 >> > > >> > >> 01:08:40,142][SplitLogWorker-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776][INFO >> > > ][org.apache.hadoop.hbase.regionserver.SplitLogWorker]: SplitLogWorker >> > > aws-scottm-tm >> > h6-1.ap-southeast-1.compute.internal,8120,1392703714776 >> > > starting >> > > 224 [2014-02-18 01:18:52,027][PRI IPC Server handler 0 on 8120][INFO >> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Received >> request >> > to >> > > open region: usertable,,1392704274325.e8be76322f79998d58750d >> > > 87f268e0ae. >> > > 225 [2014-02-18 >> > > >> > >> 01:18:52,551][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO >> > > ][org.apache.hadoop.hbase.regionserver.HRegion]: Setting up >> > tabledescriptor >> > > config now ... >> > > 226 [2014-02-18 >> > > >> > >> 01:18:52,584][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO >> > > ][org.apache.hadoop.hbase.coprocessor.CoprocessorHost]: System >> > coprocessor >> > > org.apa che.hadoop.hbase.security.token.TokenProvider was loaded >> > > successfully with priority (536870911). >> > > 227 [2014-02-18 >> > > >> > >> 01:18:52,591][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO >> > > ][org.apache.hadoop.hbase.coprocessor.CoprocessorHost]: System >> > coprocessor >> > > org.apa che.hadoop.hbase.security.access.AccessController was >> > loaded >> > > successfully with priority (536870912). >> > > 228 [2014-02-18 >> > > >> > >> 01:18:52,946][StoreOpenerThread-usertable,,1392704274325.e8be76322f79998d58750d87f268e0ae.-1][INFO >> > > ][org.apache.hadoop.hbase.regionserver.Store]: time to purge deletes >> set >> > to >> > > 0ms in store family >> > > 229 [2014-02-18 >> > > >> > >> 01:18:52,974][StoreFileOpenerThread-family-1][ERROR][org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics]: >> > > Inconsistent configuration. Previous configuration for using table >> name >> > > in metrics: true, new configuration: false >> > > 230 [2014-02-18 01:18:52,983][StoreFileOpenerThread-family-1][INFO >> > > ][org.apache.hadoop.hbase.util.ChecksumType]: Checksum using >> > > org.apache.hadoop.util.PureJavaCrc32 >> > > 231 [2014-02-18 01:18:52,983][StoreFileOpenerThread-family-1][INFO >> > > ][org.apache.hadoop.hbase.util.ChecksumType]: Checksum can use >> > > org.apache.hadoop.util.PureJavaCrc32C >> > > 232 [2014-02-18 >> > > >> > >> 01:18:53,305][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO >> > > ][org.apache.hadoop.hbase.regionserver.HRegion]: Onlined >> > > usertable,,1392704274325. e8be76322f79998d58750d87f268e0ae.; >> next >> > > sequenceid=343 >> > > ... >> > > 241 [2014-02-18 >> > > >> 01:18:53,644][PostOpenDeployTasks:e8be76322f79998d58750d87f268e0ae][INFO >> > > ][org.apache.hadoop.hbase.catalog.MetaEditor]: Updated row >> > > usertable,,1392704274325.e8be76322f79998d58750d87f268e0a e. with >> > > >> > >> server=aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776 >> > > 242 [2014-02-18 >> > > >> 01:18:53,644][PostOpenDeployTasks:e8be76322f79998d58750d87f268e0ae][INFO >> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Done with post >> > open >> > > deploy task for region=usertable,,139270 >> > > 4274325.e8be76322f79998d58750d87f268e0ae., daughter=false >> > > 243 [2014-02-18 01:19:10,142][Thread-11][WARN >> > > ][org.apache.hadoop.hdfs.DFSClient]: DataStreamer Exception >> > > 244 >> > > >> > >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >> > > No lease on >> > > >> > >> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714 >> > > >> > >> 776/aws-scottm-tmh6-1.ap-southeast-1.compute.internal%2C8120%2C1392703714776.1392703718212 >> > > File does not exist. Holder DFSClient_NONMAPREDUCE_-2013292460_25 >> does >> > not >> > > have any open files. >> > > 245 at >> > > >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2419) >> > > 246 at >> > > >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2410) >> > > 247 at >> > > >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2203) >> > > 248 at >> > > >> > >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480) >> > > 249 at >> > > >> > >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) >> > > 250 at >> > > >> > >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) >> > > 251 at >> > > >> > >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) >> > > 252 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) >> > > 253 at >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) >> > > 254 at >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) >> > > 255 at java.security.AccessController.doPrivileged(Native >> Method) >> > > 256 at javax.security.auth.Subject.doAs(Subject.java:396) >> > > 257 at >> > > >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> > > 258 at >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) >> > > 259 >> > > 260 at org.apache.hadoop.ipc.Client.call(Client.java:1225) >> > > 261 at >> > > >> > >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) >> > > 262 at $Proxy11.addBlock(Unknown Source) >> > > 263 at >> > > >> > >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) >> > > 264 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> > Method) >> > > 265 at >> > > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > > 266 at >> > > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > > 267 at java.lang.reflect.Method.invoke(Method.java:597) >> > > 268 at >> > > >> > >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) >> > > 269 at >> > > >> > >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) >> > > 270 at $Proxy14.addBlock(Unknown Source) >> > > 271 at >> > > >> > >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1176) >> > > 272 at >> > > >> > >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1029) >> > > 273 at >> > > >> > >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487) >> > > {code} >> > > >> > > >> > > >> > > Best regards >> > > >> > > takeshi >> > > >> > > >> > > 2014-02-19 11:26 GMT+08:00 Ted Yu <[email protected]>: >> > > >> > > Was region server running on 10.147.1.168 ? >> > >> >> > >> If so, can you pastebin region server log around 2014-02-18 01:17:31 >> ? >> > >> >> > >> Thanks >> > >> >> > >> >> > >> On Tue, Feb 18, 2014 at 9:17 PM, takeshi <[email protected]> >> > wrote: >> > >> >> > >> > Hi, all >> > >> > >> > >> > We also encounter the issue on our POC cluster which seems like the >> > >> issue >> > >> > yo were talking about, we also still not find the root cause yet, >> but >> > >> would >> > >> > like to share some findings here which may help others' study. >> > >> > >> > >> > Environment: >> > >> > We are using cdh-4.2.1 (hadoop-2.0.0, hbase-0.94.2) as our base >> > distro. >> > >> > with some bugfixes porting from other apache versions. running on >> 14 >> > >> > machines with OS CentOS-5.3. >> > >> > >> > >> > The error msg we found for the issue >> > >> > >> > >> > HBase master reports "Parent directory doesn't exist" error >> > >> > {code:title=hbase-master.log} >> > >> > 334 [2014-02-18 >> > >> > >> > >> > >> > >> >> > >> 01:20:30,550][master-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8100,1392703701863.archivedHFileCleaner][WARN >> > >> > ][org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache]: >> Snapshot >> > >> > di rectory: hdfs://cluster1/user/SPN-hbase/.snapshot doesn't >> exist >> > >> > 335 [2014-02-18 01:20:36,597][IPC Server handler 47 on >> > >> > 8100][ERROR][org.apache.hadoop.hbase.master.HMaster]: Region server >> > >> > >> > ^@^@aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776 >> > >> > reported a fatal error: >> > >> > 336 ABORTING region server >> > >> > >> aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776: >> > >> IOE >> > >> > in log roller >> > >> > 337 Cause: >> > >> > 338 java.io.IOException: Exception in createWriter >> > >> > 339 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:66) >> > >> > 340 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:694) >> > >> > 341 at >> > >> > >> > org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:631) >> > >> > 342 at >> > >> > >> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) >> > >> > 343 at java.lang.Thread.run(Thread.java:662) >> > >> > 344 Caused by: java.io.IOException: cannot get log writer >> > >> > 345 at >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:750) >> > >> > 346 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:60) >> > >> > 347 ... 4 more >> > >> > 348 Caused by: java.io.IOException: java.io.FileNotFoundException: >> > >> Parent >> > >> > directory doesn't exist: >> > >> > >> > >> > >> > >> >> > >> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776 >> > >> > 349 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1726) >> > >> > 350 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1848) >> > >> > 351 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1770) >> > >> > 352 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1747) >> > >> > 353 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:418) >> > >> > 354 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205) >> > >> > 355 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068) >> > >> > 356 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) >> > >> > 357 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) >> > >> > 358 at >> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) >> > >> > 359 at >> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) >> > >> > 360 at java.security.AccessController.doPrivileged(Native >> > >> Method) >> > >> > 361 at javax.security.auth.Subject.doAs(Subject.java:396) >> > >> > 362 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> > >> > 363 at >> > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) >> > >> > 364 >> > >> > 365 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:173) >> > >> > 366 at >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:747) >> > >> > 367 ... 5 more >> > >> > {code} >> > >> > >> > >> > namenode mkdirs for "/<hbase-root>/.logs/<regionserver>" requested >> > from >> > >> > principal "hbase/<fqdn>@LAB" , but remove the "/<hbase-root>/.logs" >> > >> later >> > >> > by requested from "hdfs/<fqdn>@LAB" ! >> > >> > {code:title=namenode.log (with hdfs audit enabled)} >> > >> > 624 2014-02-18 01:08:38,203 INFO >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> > allowed=true >> > >> > ugi=hbase/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB >> > >> > (auth:KERBEROS) >> > >> > ip=/ >> > >> > 10.147.1.168 cmd=mkdirs >> > >> > >> > >> > >> > >> >> > >> src=/user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776 >> > >> > dst=null perm=hbase:supergroup:rwxr-xr-x >> > >> > ... >> > >> > 1068 2014-02-18 01:17:31,189 INFO >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> > allowed=true >> > >> > ugi=hdfs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB >> > >> > (auth:KERBEROS) >> > >> > ip=/ >> > >> > 10.147.1.168 cmd=delete src=/user/SPN-hbase/.logs >> > >> > dst=null perm=null >> > >> > ... >> > >> > 1343 2014-02-18 01:19:10,132 ERROR >> > >> > org.apache.hadoop.security.UserGroupInformation: >> > >> PriviledgedActionException >> > >> > as:hbase/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB >> > >> > (auth:KERBEROS) >> > >> > cause:org.apache >> > >> > .hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on >> > >> > >> > >> > >> > >> >> > >> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/aws-scottm-tmh6-1.ap-southeast-1.compute.inter >> > >> > nal%2C8120%2C1392703714776.1392703718212 File does not exist. >> Holder >> > >> > DFSClient_NONMAPREDUCE_-2013292460_25 does not have any open >> files. >> > >> > 1344 2014-02-18 01:19:10,132 INFO org.apache.hadoop.ipc.Server: IPC >> > >> Server >> > >> > handler 83 on 8020, call >> > >> > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from >> > >> > 10.147.1.168:35703: error: org.apache.hadoop. >> > >> > hdfs.server.namenode.LeaseExpiredException: No lease on >> > >> > >> > >> > >> > >> >> > >> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/aws-scottm-tmh6-1.ap-southeast-1.compute.internal%2C81 >> > >> > 20%2C1392703714776.1392703718212 File does not exist. Holder >> > >> > DFSClient_NONMAPREDUCE_-2013292460_25 does not have any open >> files. >> > >> > 1345 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: >> No >> > >> lease >> > >> > on >> > >> > >> > >> > >> > >> >> > >> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/ >> > >> > aws-scottm-tmh6-1.ap-southeast-1.com >> > >> > pute.internal%2C8120%2C1392703714776.1392703718212 File does not >> > exist. >> > >> > Holder DFSClient_NONMAPREDUCE_-2013292460_25 does not have any >> open >> > >> files. >> > >> > 1346 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2419) >> > >> > 1347 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2410) >> > >> > 1348 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2203) >> > >> > 1349 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480) >> > >> > 1350 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) >> > >> > 1351 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) >> > >> > 1352 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) >> > >> > 1353 at >> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) >> > >> > 1354 at >> > >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) >> > >> > 1355 at >> > >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) >> > >> > 1356 at java.security.AccessController.doPrivileged(Native >> > >> Method) >> > >> > 1357 at javax.security.auth.Subject.doAs(Subject.java:396) >> > >> > 1358 at >> > >> > >> > >> > >> > >> >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> > >> > 1359 at >> > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) >> > >> > {code} >> > >> > >> > >> > Way to reproduce: >> > >> > Currently, we have a way to reproduce this issue, which is running >> > ycsb >> > >> > workload[a-f] to HBase, and the other hand, we simply to execute >> the >> > >> > service <hadoop,hbase-daemons> start. Then we keep to running 10 >> times >> > >> in a >> > >> > loop on both side for one round, it could usually be reproduced on >> > >> first or >> > >> > second round. But I am not sure whether it is related to the issue >> in >> > >> the >> > >> > beginning of this discussion thread talked about. >> > >> > >> > >> > >> > >> > Follow-ups: >> > >> > Now I am following the weird msg appeared in the namenode.log >> > >> > {code} >> > >> > 2014-02-18 01:17:31,189 INFO >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: >> > allowed=true >> > >> > ugi=hdfs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB >> > >> > (auth:KERBEROS) >> > >> > ip=/ >> > >> > 10.147.1.168 cmd=delete src=/user/SPN-hbase/.logs >> > >> > dst=null perm=null >> > >> > {code} >> > >> > To see what can I dig out more... >> > >> > >> > >> > >> > >> > I'll keep to update this thread if any finding, and welcome any >> > opinion, >> > >> > tks >> > >> > >> > >> > >> > >> > Best regards >> > >> > >> > >> > takeshi >> > >> > >> > >> >> > > >> > > >> > >> > >
