[ 
https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192411#comment-15192411
 ] 

Anoop Sam John commented on HBASE-15436:
----------------------------------------

{code}
"pool-14-thread-1" prio=10 tid=0x00007f4215268000 nid=0x46e6 waiting on 
condition [0x00007f41fe75d000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000eeb5a010> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
        at 
org.apache.hadoop.hbase.util.BoundedCompletionService.take(BoundedCompletionService.java:75)
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:190)
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
        at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109)
        at 
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
        at 
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
        at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
        at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
{code}

When I say the flush is continuing with each of the Mutation and you dont see, 
the thread doing flush op doing nothing, u say it looks not. But the issue is 
the thread doing the flush op works in a loop and that op in turn given a Meta 
table scan.  This u can see that the scan op is given to another thread in a 
pool. The original flush thread is waiting for the completion of that scan 
thread.  This u can clearly see in above trace.
So it is like this thread will wait for the result and that result is an 
Exception (SocketTimeout) which it will see after mins. Then the flush thread 
again comes back to life and continue that loop and again wil go into this wait 
mode..!!

> BufferedMutatorImpl.flush() appears to get stuck
> ------------------------------------------------
>
>                 Key: HBASE-15436
>                 URL: https://issues.apache.org/jira/browse/HBASE-15436
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.2
>            Reporter: Sangjin Lee
>         Attachments: hbaseException.log, threaddump.log
>
>
> We noticed an instance where the thread that was executing a flush 
> ({{BufferedMutatorImpl.flush()}}) got stuck when the (local one-node) cluster 
> shut down and was unable to get out of that stuck state.
> The setup is a single node HBase cluster, and apparently the cluster went 
> away when the client was executing flush. The flush eventually logged a 
> failure after 30+ minutes of retrying. That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the 
> {{flush()}} call). I would have expected the {{flush()}} call to return after 
> the complete failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to