[
https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860625#comment-13860625
]
Jimmy Xiang commented on HBASE-8912:
------------------------------------
For the fix-races patch, I understand the change to
HRegionServer#removeFromRegionsInTransition. For the OpenRegionHandler change,
we do call this.rsServices.removeFromRegionsInTransition(this.regionInfo) in
the final block. I was wondering how the change will help. It should help if
master tries to assign the region to the same host again, which is very common
in unit tests. However, if we removed the region from the transition region
list before we change the znode, if another openRegion call gets to this server
now, it could see wrong znode state. This is unlikely to happen (master assigns
the same region using the same znode version). However, in the finally block we
could remove the region from transition by mistake (the new transition is
removed instead). Will this cause any issue?
> [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to
> OFFLINE
> ----------------------------------------------------------------------------------
>
> Key: HBASE-8912
> URL: https://issues.apache.org/jira/browse/HBASE-8912
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Priority: Critical
> Fix For: 0.94.16
>
> Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt,
> HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt,
> org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt
>
>
> AM throws this exception which subsequently causes the master to abort:
> {code}
> java.lang.IllegalStateException: Unexpected state :
> testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b.
> state=PENDING_OPEN, ts=1372891751912,
> server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE.
> at
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
> at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> This exception trace is from the failing test TestMetaReaderEditor which is
> failing pretty frequently, but looking at the test code, I think this is not
> a test-only issue, but affects the main code path.
> https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)