[
https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192411#comment-15192411
]
Anoop Sam John commented on HBASE-15436:
----------------------------------------
{code}
"pool-14-thread-1" prio=10 tid=0x00007f4215268000 nid=0x46e6 waiting on
condition [0x00007f41fe75d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000eeb5a010> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
at
org.apache.hadoop.hbase.util.BoundedCompletionService.take(BoundedCompletionService.java:75)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:190)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
at
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200)
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109)
at
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
at
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
{code}
When I say the flush is continuing with each of the Mutation and you dont see,
the thread doing flush op doing nothing, u say it looks not. But the issue is
the thread doing the flush op works in a loop and that op in turn given a Meta
table scan. This u can see that the scan op is given to another thread in a
pool. The original flush thread is waiting for the completion of that scan
thread. This u can clearly see in above trace.
So it is like this thread will wait for the result and that result is an
Exception (SocketTimeout) which it will see after mins. Then the flush thread
again comes back to life and continue that loop and again wil go into this wait
mode..!!
> BufferedMutatorImpl.flush() appears to get stuck
> ------------------------------------------------
>
> Key: HBASE-15436
> URL: https://issues.apache.org/jira/browse/HBASE-15436
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 1.0.2
> Reporter: Sangjin Lee
> Attachments: hbaseException.log, threaddump.log
>
>
> We noticed an instance where the thread that was executing a flush
> ({{BufferedMutatorImpl.flush()}}) got stuck when the (local one-node) cluster
> shut down and was unable to get out of that stuck state.
> The setup is a single node HBase cluster, and apparently the cluster went
> away when the client was executing flush. The flush eventually logged a
> failure after 30+ minutes of retrying. That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the
> {{flush()}} call). I would have expected the {{flush()}} call to return after
> the complete failure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)