[
https://issues.apache.org/jira/browse/HBASE-20503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457190#comment-16457190
]
stack commented on HBASE-20503:
-------------------------------
Its a version of hadoop3 [~Apache9] Also, this hang has happened once that I
know of (we recovered the cluster and the assign of the 30k went through nice
and promptly....)
bq. Stream broken every ten minutes, it is likely that we fail to renew lease
on the given file so namenode will close the file after the lease timeout...
As I see it, stream 'breaks' (nothing in log on why...), subsequent sync fails
to run because stream is broke, dumps exception 'java.io.IOException: stream
already broken', then goes to roll the log. We try to recover lease on the
current log, fail... then no more. It seems like we are stuck doing another
attempt at lease recovery? Do I have that right?
Side Question: Why we trying to recover the lease? Why don't we just dump the
bad log? Let recovery process do recovery of lease if it needs too? This is
done in-line holding up writes?
For first stream break, we have this in NN log, which is two attempts at
recovery and then the NN completes the block. The AsyncFSWAL client is stuck
doing a third attempt at recovery?
{code}
2018-04-25 10:10:53,965 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocate blk_1096417740_22686205, replicas=10.17.208.15:20002,
10.17.212.38:20002, 10.17.212.23:20002 for
/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
2018-04-25 10:15:16,692 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: [Lease.
Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
from client DFSClient_NONMAPREDUCE_1739935309_1
2018-04-25 10:15:16,692 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
2018-04-25 10:15:16,692 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.internalReleaseLease: File
/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
has not been closed. Lease recovery is in progress. RecoveryId = 22686363 for
block blk_1096417740_22686205
2018-04-25 10:15:20,710 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: [Lease.
Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
from client DFSClient_NONMAPREDUCE_1739935309_1
2018-04-25 10:15:20,710 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
2018-04-25 10:15:29,787 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(oldBlock=BP-245617012-10.17.XXX-XX-XXXX556508602:blk_1096417740_22686205,
file=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta,
newgenerationstamp=22686363, newlength=7935860,
newtargets=[10.17.208.15:20002, 10.17.212.38:20002]) successful
{code}
... but looking at random WAL writings, they all have the below pattern... we
don't seem to report successful lease recovery... maybe its not logged at INFO
level.. need to look.
{code}
2018-04-25 10:24:58,099 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocate blk_1096418011_22686541, replicas=10.17.208.15:20002,
10.17.216.51:20002, 10.17.216.38:20002 for
/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
2018-04-25 10:33:28,352 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: [Lease.
Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
from client DFSClient_NONMAPREDUCE_1739935309_1
2018-04-25 10:33:28,353 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
2018-04-25 10:33:28,353 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.internalReleaseLease: File
/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
has not been closed. Lease recovery is in progress. RecoveryId = 22686663 for
block blk_1096418011_22686541
2018-04-25 10:33:31,403 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(oldBlock=BP-245617012-10.17.XXX-XX-XXXX556508602:blk_1096418011_22686541,
file=/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta,
newgenerationstamp=22686663, newlength=684698, newtargets=[10.17.208.15:20002,
10.17.216.51:20002]) successful 2018-04-25 10:33:32,357
INFO org.apache.hadoop.ipc.Server: IPC Server handler 26 on 8020, call
Call#23171 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease
from 10.17.208.15:42005: java.io.FileNotFoundException: File does not exist:
/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
~
~
{code}
> [AsyncFSWAL] Failed to get sync result after 300000 ms for txid=160912, WAL
> system stuck?
> -----------------------------------------------------------------------------------------
>
> Key: HBASE-20503
> URL: https://issues.apache.org/jira/browse/HBASE-20503
> Project: HBase
> Issue Type: Bug
> Components: wal
> Reporter: stack
> Priority: Critical
> Fix For: 2.0.1
>
>
> Scale test. Startup w/ 30k regions over ~250nodes. This RS is trying to
> furiously open regions assigned by Master. It is importantly carrying
> hbase:meta. Twenty minutes in, meta goes dead after an exception up out
> AsyncFSWAL. Process had been restarted so I couldn't get a thread dump.
> Suspicious is we archive a WAL and we get a FNFE because we got to access WAL
> in old location. [~Apache9] mind taking a look? Does this FNFE rolling kill
> the WAL sub-system? Thanks.
> DFS complaining on file open for a few files getting blocks from remote dead
> DNs: e.g. {{2018-04-25 10:05:21,506 WARN
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing
> remote block reader.
> java.net.ConnectException: Connection refused}}
> AsyncFSWAL complaining: "AbstractFSWAL: Slow sync cost: 103 ms" .
> About ten minutes in, we get this:
> {code}
> 2018-04-25 10:15:16,532 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: sync failed
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
>
>
>
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:134)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:364)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:547)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:15:16,680 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
> with entries=10819, filesize=7.57 MB; new WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676516535.meta
> 2018-04-25 10:15:16,680 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524675848653.meta
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524675848653.meta
> 2018-04-25 10:15:16,686 WARN
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter: Failed to
> write trailer, non-fatal, continuing...
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.lambda$writeWALTrailerAndMagic$3(AsyncProtobufLogWriter.java:210)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(AsyncProtobufLogWriter.java:166)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALTrailerAndMagic(AsyncProtobufLogWriter.java:201)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.writeWALTrailer(AbstractProtobufLogWriter.java:233)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:143)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:15:16,687 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter: normal close
> failed, try recover
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.endBlock(FanOutOneBlockAsyncDFSOutput.java:521)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.close(FanOutOneBlockAsyncDFSOutput.java:565)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:144)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> We want to roll because we failed sync, we fail writing trailer because
> stream is broke, we fail close for same reason, then we fall into a block on
> waiting on lease recovery (Should be trying to recover the lease inline? We
> fail to recover the lease after 4 seconds, that ok?).
> We seem to recover fine though. We continue. I see some meta flushes.
> Twenty minutes in, we have similar event-type for the meta and rsgroup wals
> that are on this server. What is interesting is the FileNotFound. We seem to
> be recovering lease on a file we just archived.
> {code}
> 2018-04-25 10:24:17,558 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: sync failed
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:134)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:364)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:547)
>
>
>
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>
>
>
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:24:17,579 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined 3aa9f94ed073b55c35147eae127b3ebe; next sequenceid=306801
> 2018-04-25 10:24:18,015 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676516535.meta
> with entries=21496, filesize=12.68 MB; new WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677057560.meta
> 2018-04-25 10:24:18,016 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676253923.meta
> 2018-04-25 10:24:18,049 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524676516724
> with entries=168, filesize=68.67 KB; new WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022
> 2018-04-25 10:24:18,050 WARN
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter: Failed to
> write trailer, non-fatal, continuing...
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.lambda$writeWALTrailerAndMagic$3(AsyncProtobufLogWriter.java:210)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(AsyncProtobufLogWriter.java:166)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALTrailerAndMagic(AsyncProtobufLogWriter.java:201)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.writeWALTrailer(AbstractProtobufLogWriter.java:233)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:143)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:24:18,051 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
> for
> IntegrationTestBigLinkedList_20180418091745,\x1C\x84\xA4z\x07\xF5an,1524068594035.4cad45e6d4430b1a39538a045f2c3909.
> 2018-04-25 10:24:18,051 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
> for
> IntegrationTestBigLinkedList_20180418091745,0\x11\xD3\xC0\x1E\x8F\x8F\x08,1524068594035.3aa9f94ed073b55c35147eae127b3ebe.
> 2018-04-25 10:24:18,050 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524676516724
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524676516724
> 2018-04-25 10:24:18,051 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter: normal close
> failed, try recover
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.endBlock(FanOutOneBlockAsyncDFSOutput.java:521)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.close(FanOutOneBlockAsyncDFSOutput.java:565)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:144)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:24:18,053 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Recover lease on dfs file
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524676516724
> 2018-04-25 10:24:18,058 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Failed
> to recover lease, attempt=0 on
> file=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524676516724
> after 5ms
> 2018-04-25 10:24:20,434 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase metrics
> system...
> 2018-04-25 10:24:20,436 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> stopped.
> 2018-04-25 10:24:20,940 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2018-04-25 10:24:20,941 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot
> period at 10 second(s).
> 2018-04-25 10:24:20,941 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> started
> 2018-04-25 10:24:22,156 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: close old writer failed
> java.io.FileNotFoundException: File does not exist:
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524676516724
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2490)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:809)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:737)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> {code}
> Here is NN log for the above rsgroup file
> {code}
> 2018-04-25 10:24:18,030 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1096417978_22686485, replicas=10.17.208.15:20002,
> 10.17.218.44:20002, 10.17.218.35:20002 for
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022
> 2018-04-25 10:24:58,220 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: [Lease.
> Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
> src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022
> from client DFSClient_NONMAPREDUCE_1739935309_1
> 2018-04-25 10:24:58,221 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_1739935309_1, pending creates: 3],
> src=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022
> 2018-04-25 10:24:58,221 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: File
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022
> has not been closed. Lease recovery is in progress. RecoveryId = 22686546
> for block blk_1096417978_22686485
> 2018-04-25 10:25:00,419 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(oldBlock=BP-245617012-10.17.XXX-XX-XXXX556508602:blk_1096417978_22686485,
>
> file=/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022,
> newgenerationstamp=22686546, newlength=1066, newtargets=[10.17.208.15:20002,
> 10.17.218.44:20002]) successful
> 2018-04-25 10:25:02,257 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 128 on 8020, call Call#23009 Retry#0
> org.apache.hadoop.hdfs.protocol.ClientProtocol.recoverLease from
> 10.17.208.15:35769: java.io.FileNotFoundException: File does not exist:
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677058022
> {code]
> Five minutes later same thing happens for the meta file:
> {code}
> 2018-04-25 10:33:28,347 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> with entries=941, filesize=668.65 KB; new WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677608280.meta
> 2018-04-25 10:33:28,347 WARN
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter: Failed to
> write trailer, non-fatal, continuing...
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
>
>
>
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
>
>
>
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.lambda$writeWALTrailerAndMagic$3(AsyncProtobufLogWriter.java:210)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.write(AsyncProtobufLogWriter.java:166)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALTrailerAndMagic(AsyncProtobufLogWriter.java:201)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.writeWALTrailer(AbstractProtobufLogWriter.java:233)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:143)
>
>
>
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:33:28,347 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676516535.meta
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524676516535.meta
>
> 2018-04-25 10:33:28,348 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter: normal close
> failed, try recover
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.endBlock(FanOutOneBlockAsyncDFSOutput.java:521)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.close(FanOutOneBlockAsyncDFSOutput.java:565)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:144)
>
>
>
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-04-25 10:33:28,349 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Recover lease on dfs file
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> 2018-04-25 10:33:28,353 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Failed
> to recover lease, attempt=0 on
> file=/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> after 4ms
> 2018-04-25 10:33:28,355 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677057560.meta
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677057560.meta
> 2018-04-25 10:33:28,361 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> 2018-04-25 10:33:28,405 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Rolled WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677098139
> with entries=3, filesize=1.49 KB; new WAL
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677608367
> 2018-04-25 10:33:28,405 INFO
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Archiving
> hdfs://ns1/hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677098139
> to
> hdfs://ns1/hbase/oldWALs/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.vc0205.halxg.cloudera.com%2C22101%2C1524675808073.regiongroup-0.1524677098139
> 2018-04-25 10:33:30,931 INFO org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> totalSize=328.08 MB, freeSize=2.84 GB, max=3.16 GB, blockCount=2835,
> accesses=21635, hits=21635, hitRatio=100.00%, , cachingAccesses=21617,
> cachingHits=21617, cachingHitsRatio=100.00%, evictions=179, evicted=0,
> evictedPerRun=0.0
> 2018-04-25 10:33:32,359 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: close old writer failed
> java.io.FileNotFoundException: File does not exist:
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2490)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:809)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:737)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> Now we seem to have a sick WAL.
> There is this lone exception after 5 minutes with no accompanying attempt at
> roll:
> {code}
> 2018-04-25 10:39:12,741 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: sync failed
> java.io.IOException: stream already broken
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush0(FanOutOneBlockAsyncDFSOutput.java:424)
>
>
>
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.flush(FanOutOneBlockAsyncDFSOutput.java:513)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.sync(AsyncProtobufLogWriter.java:134)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:364)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume(AsyncFSWAL.java:547)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>
>
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> meta seems to have stalled taking edits seeminly because it cannot flush
> (write its flush marker...) Ten minutes later I get this:
> {code}
> 2018-04-25 10:49:13,051 ERROR
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flush failed for
> region hbase:meta,,1
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync
> result after 300000 ms for txid=160912, WAL system stuck?
> at
> org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:136)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:596)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2549)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2495)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2353)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2327)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2218)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:497)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:466)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:69)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:252)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Lots of log about wanting to flush but can't and these:
> 2018-04-25 10:44:12,776 WARN org.apache.hadoop.hbase.ipc.RpcServer:
> (responseTooSlow):
> {"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","starttimems":1524677952741,"responsesize":1558,"method":"Multi","param":"region=
> hbase:meta,,1, for 1 actions and 1st row
> key=BUNDLE-REDACTED\\xBF\\x08\\x1E\"\\xE9X.\\x1A,1524000668742.5add42a952519e671face3500922057f.","processingtimems":300034,"client":"10.17.208.13:44534","queuetimems":0,"class":"HRegionServer"}
> On the end is one of these:
> {code}
> 2018-04-25 10:33:32,359 WARN
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL: close old writer failed
> java.io.FileNotFoundException: File does not exist:
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2490)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:809)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:737)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>
>
>
>
> at
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>
>
>
>
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>
>
>
>
> at
> java.security.AccessController.doPrivileged(Native Method)
>
>
>
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
>
>
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>
>
>
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>
>
>
>
>
>
>
>
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>
>
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>
>
>
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>
>
> at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
>
>
>
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>
>
>
>
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>
>
>
>
> at
> org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:873)
>
>
>
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:290)
>
>
>
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:287)
>
>
>
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
>
>
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:302)
>
>
>
>
> at
> org.apache.hadoop.hbase.util.FSHDFSUtils.recoverLease(FSHDFSUtils.java:283)
>
>
>
>
> at
> org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:216)
>
>
>
>
> at
> org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:163)
>
>
>
>
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.recoverAndClose(FanOutOneBlockAsyncDFSOutput.java:555)
>
>
>
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:147)
>
>
>
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$executeClose$8(AsyncFSWAL.java:742)
>
>
>
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>
>
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>
>
>
> at java.lang.Thread.run(Thread.java:748)
>
>
>
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
> does not exist:
> /hbase/WALs/vc0205.halxg.cloudera.com,22101,1524675808073/vc0205.halxg.cloudera.com%2C22101%2C1524675808073.meta.1524677098059.meta
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
>
>
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
>
>
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2490)
>
>
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:809)
>
>
>
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:737)
>
>
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
>
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>
>
>
> at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>
>
>
>
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>
>
>
>
> at
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>
>
>
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
>
>
> at
> javax.security.auth.Subject.doAs(Subject.java:422)
>
>
>
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>
>
>
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>
>
>
>
>
>
>
>
>
> at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>
>
>
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>
>
>
>
> at
> org.apache.hadoop.ipc.Client.call(Client.java:1347)
>
>
>
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>
>
>
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>
>
>
>
> at com.sun.proxy.$Proxy24.recoverLease(Unknown Source)
>
>
>
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:683)
>
>
>
> at
> sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
>
>
>
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>
>
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
>
>
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>
>
>
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>
>
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>
>
>
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>
>
>
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>
>
>
>
> at com.sun.proxy.$Proxy25.recoverLease(Unknown Source)
>
>
>
>
> at
> sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
>
>
>
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>
>
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
>
>
>
> at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
>
>
>
>
> at com.sun.proxy.$Proxy26.recoverLease(Unknown Source)
>
>
>
>
> at
> org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:871)
>
>
>
>
> ... 13 more
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)