[
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567738#comment-13567738
]
Jonathan Hsieh commented on HBASE-7711:
---------------------------------------
I'm +1 for this patch modulo the nit below, but not convinced this solves all
the problems.
The other exception trace adds row name to exn. Please add this detail to
exception message?
{code}
try {
if (!existingLatch.await(this.rowLockWaitDuration,
TimeUnit.MILLISECONDS)) {
throw new IOException("Timed out on getting lock for row="
+ Bytes.toStringBinary(row));
}
} catch (InterruptedException ie) {
// Empty
}
{code}
> rowlock release problem with thread interruptions in batchMutate
> ----------------------------------------------------------------
>
> Key: HBASE-7711
> URL: https://issues.apache.org/jira/browse/HBASE-7711
> Project: HBase
> Issue Type: Bug
> Reporter: Jonathan Hsieh
> Assignee: Ted Yu
> Fix For: 0.96.0
>
> Attachments: 7711.txt, 7711-v2.txt
>
>
> An earlier version of snapshots would thread interrupt operations. In longer
> term testing we ran into an exception stack trace that indicated that a
> rowlock was taken an never released.
> {code}
> 2013-01-26 01:54:56,417 ERROR
> org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign
> exception to subprocedure pe-1
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via
> timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
> org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed!
> Source:Timeout caused Foreign E
> xception Start:1359194035004, End:1359194095004, diff:60000, max:60000 ms
> at
> org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
> at
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
> at
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
> at
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
> at
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> Caused by:
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
> org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed!
> Source:Timeout caused Foreign Exception Start:1359194035004,
> End:1359194095004, diff:60000, max:60000 ms
> at
> org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion:
> Failed getting lock in batch put, row=0001558252
> java.io.IOException: Timed out on getting lock for row=0001558252
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
> at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
> ....
> .. every snapshot attempt that used this region for the next two days
> encountered this problem.
> {code}
> Snapshots will now bypass this problem with the fix in HBASE-7703. However,
> we should make sure hbase regionserver operations are safe when interrupted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira