[ 
https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217714#comment-15217714
 ] 

Nicolas Liochon commented on HBASE-15436:
-----------------------------------------

bq.  There should be a cap size for the size above which we should block the 
writes. We should not take more than this limit. May be some thing like 1.5 
times of what is the flush size.
We definitively want to take more than this limit, but may be not as much as 
what we're taking today (or maybe we want to be clearer on what these settings 
mean)
There is a limit, given by the number of task executed in parallel 
(hbase.client.max.total.tasks). If I understand correctly, this setting is now 
per client (and not per htable).
Ideally these parameters should be hidden to the user (i.e. the defaults are ok 
for a standard client w/o too much memory constraints). 

bq. How long we should wait? Whether we should come out faster? 
iirc, A long time ago, the buffer was attached to the Table object, so the 
policy (or at least the objective :-)) when one of the puts had failed (i.e. 
reached the max retry number) was simple: all the operations currently in the 
buffer were considered as failed as well, even if we had not even tried to send 
them. As a consequence the buffer was empty after the failure of a single put. 
It was then up to the client to continue or not. May be we should do the same 
with the buffered mutator, for all  cases, close or not? I haven't looked at 
the bufferedMutator code, but I can have a look it you whish [~anoop.hbase]. 

bq.  What if we were doing multi Get to META table to know the region location 
for N mutations at a time.
It seems like a good idea. There are many possible optimisation on how we use 
meta, and this is one of them.




> BufferedMutatorImpl.flush() appears to get stuck
> ------------------------------------------------
>
>                 Key: HBASE-15436
>                 URL: https://issues.apache.org/jira/browse/HBASE-15436
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.2
>            Reporter: Sangjin Lee
>         Attachments: hbaseException.log, threaddump.log
>
>
> We noticed an instance where the thread that was executing a flush 
> ({{BufferedMutatorImpl.flush()}}) got stuck when the (local one-node) cluster 
> shut down and was unable to get out of that stuck state.
> The setup is a single node HBase cluster, and apparently the cluster went 
> away when the client was executing flush. The flush eventually logged a 
> failure after 30+ minutes of retrying. That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the 
> {{flush()}} call). I would have expected the {{flush()}} call to return after 
> the complete failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to