[
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632990#comment-16632990
]
Hudson commented on HBASE-21196:
--------------------------------
Results for branch master
[build #517 on
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/517/]: (x)
*{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://builds.apache.org/job/HBase%20Nightly/job/master/517//General_Nightly_Build_Report/]
(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://builds.apache.org/job/HBase%20Nightly/job/master/517//JDK8_Nightly_Build_Report_(Hadoop2)/]
(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://builds.apache.org/job/HBase%20Nightly/job/master/517//JDK8_Nightly_Build_Report_(Hadoop3)/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> HTableMultiplexer clears the meta cache after every put operation
> -----------------------------------------------------------------
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Affects Versions: 3.0.0, 1.3.3, 2.2.0
> Reporter: Nihal Jain
> Assignee: Nihal Jain
> Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21196.master.001.patch,
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch,
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName,
> MultiResponse, int)}} API with tablename set to null reset the meta cache of
> the corresponding server after each call. One such operation is put operation
> of HTableMultiplexer (Might not be the only one). This may impact the
> performance of the system severely as all new ops directed to that server
> will have to go to zk first to get the meta table address and then get the
> location of the table region as it will become empty after every
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region
> locations are cleared. As a side effect of this, before every put the server
> needs to contact zk and get meta table location and read meta to get region
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283):
> Removed all cached region locations that map to
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5]
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811]
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get"
> request_param: true priority: 0 timeout: 60000 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811]
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size:
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811]
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size:
> 137 connection: 127.0.0.1:42338 param: region=
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f.,
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result {
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811]
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0,
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get,
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan
> table=hbase:meta,
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179):
> Advancing internal small scanner to startKey at
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up
> meta region location in ZK,
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see
> that the string "Removed all cached region locations that map" and "Looking
> up meta region location in ZK" are present for every put.
> *Analysis:*
> The problem occurs as we call the {{cleanServerCache}} method always clears
> the server cache in case tablename is null and exception is null. See
> [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918]
> {code:java}
> private void cleanServerCache(ServerName server, Throwable regionException) {
> if (tableName == null &&
> ClientExceptionsUtil.isMetaClearingException(regionException)) {
> // For multi-actions, we don't have a table name, but we want to make
> sure to clear the
> // cache in case there were location-related exceptions. We don't to
> clear the cache
> // for every possible exception that comes through, however.
> asyncProcess.connection.clearCaches(server);
> }
> }
> {code}
> The problem isĀ
> [ClientExceptionsUtil.isMetaClearingException(regionException))|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L51]
> assumes that the caller should take care of null exception check before
> calling the method i.e. it will return true if the passed exception is null,
> which may not be a correct assumption.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)