[ 
https://issues.apache.org/jira/browse/HBASE-19639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405105#comment-16405105
 ] 

Anoop Sam John commented on HBASE-19639:
----------------------------------------

bq.I don't follow. 4x is the default. You suggesting we do 2x?
No boss. Yes it is 4x by default.  I was just saying the fact that you are 
seeing the Exception because flush seems to be slower compared to the writes 
speed. 4x seems good enough
bq.we should be able to test in isolation? 
Yes we can I believe.
bq.This is default. I'm trying to run w/ all defaults. I will change default if 
we come up w/ something better ("...95% of 
hbase.regionserver.global.memstore.size..."). We don't block writes now. 
Instead we throw the RegionTooBusyException. We don't put anything in the logs. 
Let me fix this. Would be good to note when the Region is struggling... or at 
least let me see if a metric I could use.
I mean to ask what is the size in GBs for the global memstore size. This is 
default 40% of Xmx.  We throw RegionTooBusyException from region when we try 
write on a memstore with size >= 4 x flush size.    But when the global 
memstore size (Sum of all memstores) above this barrier we will block writes. I 
can not see throwing Exception in such case.
bq.Hmm.. This last run of mine had it at 8 *. Let me set it do the default (4 * 
).
That looks to be too high boss.  IMO it should be less than 4x.. Because at 
each region level, we have 4x as max size and above which we throw Exception.  
The global memstore size barrier being  8x of (regions count * per region flush 
size) means theoretically we allow each region to grow upto 8x of flush size.  
To be clear I mean this. For my tests
regions count * per region flush size * 2 =  40% xmx
Ya when writes are becoming fast, we badly need faster flushes.


> ITBLL can't go big because RegionTooBusyException... Above memstore limit
> -------------------------------------------------------------------------
>
>                 Key: HBASE-19639
>                 URL: https://issues.apache.org/jira/browse/HBASE-19639
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> Running ITBLLs, the basic link generator keeps failing because I run into 
> exceptions like below:
> {code}
> 2017-12-26 19:23:45,284 INFO [main] 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator: 
> Persisting current.length=1000000, count=1000000, id=Job: 
> job_1513025868268_0062 Task: attempt_1513025868268_0062_m_000006_2, 
> current=\x8B\xDB25\xA7*\x9A\xF5\xDEx\x83\xDF\xDC?\x94\x92, i=1000000
> 2017-12-26 19:24:18,982 INFO [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=10/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>  on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, 
> retrying after=10050ms, replay=524ops
> 2017-12-26 19:24:29,061 INFO [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=11/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>  on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, 
> retrying after=10033ms, replay=524ops
> 2017-12-26 19:24:37,183 INFO [ReadOnlyZKClient] 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x015051a0 no activities 
> for 60000 ms, close active connection. Will reconnect next time when there 
> are new requests.
> 2017-12-26 19:24:39,122 WARN [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=12/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> ...
> {code}
> Fails task over and over. With server-killing monkeys.
> 24Gs which should be more than enough.
> Had just finished a big compaction.
> Whats shutting us out? Why taking so long to flush? We seen stuck at limit so 
> job fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to