[
https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861925#comment-13861925
]
Jean-Marc Spaggiari commented on HBASE-8912:
--------------------------------------------
After the first restart, 36 regions are stuck in transition :( But not any
server crashed.
What I did:
- Restored default balancer to make sure as much regions as possible will move.
- Stop/start HBase
- Run balancer from shell.
Every thing is back up after a 2nd restart.
I get many errors like this one:
{code}
2014-01-03 16:03:03,958 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received FAILED_OPEN for region b75cb9067c3c4456d6198c9237c143b3 from server
node4.domain.com,60020,1388782921790 but region was in the state
page,rf.idua.www\x1Fhttp\x1F-1\x1F/fr/brand/fr/audi_fleet_solutions/contact/contact_transport_personnes.html\x1Fnull,1379103792232.b75cb9067c3c4456d6198c9237c143b3.
state=CLOSED, ts=1388782983373, server=node4.domain.com,60020,1388782921790
and not in OFFLINE, PENDING_OPEN or OPENING
{code}
After investigations, I figured that snappy was missing on a server. I fixed
that, restart: All seems to be fine. So I restored my customized balancer,
restart, balanced.
Still some warning in the logs:
{code}
2014-01-03 16:21:52,864 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region db8e67acde26bf340da481d3c1b934cd from server
node4.domain.com,60020,1388784051197 but region was in the state
page,moc.tenretnigruoboc.www\x1Fhttp\x1F-1\x1F/cobourg-and-the-web\x1Fnull,1379103844627.db8e67acde26bf340da481d3c1b934cd.
state=OPEN, ts=1388784100392, server=node4.distparser.com,60020,1388784051197
and not in expected OFFLINE, PENDING_OPEN or OPENING states
{code}
But this time all the regions are assigned correctly.
I did that one more time (change balancer, stop, start, balance. Change
balancer, stop, start, balance). I turned loglevel to warn.
{code}
2014-01-03 16:28:51,142 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region 17bee313797fc1ce982c0e31fdb6620c from server
node8.domain.com,60020,1388784498327 but region was in the state
page,rf.ofniecnarf.www\x1Fhttp\x1F-1\x1F/vote/comment/27996/1/vote/zero_vote/c99b0992e5a9cd6bf3a4cfc91769ceeb\x1Fnull,1379104524006.17bee313797fc1ce982c0e31fdb6620c.
state=OPEN, ts=1388784531048, server=node8.distparser.com,60020,1388784498327
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,135 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region 6dc6290df1855b319f60bf89faa3da41 from server
node8.domain.com,60020,1388784498327 but region was in the state
page_crc,\x00\x00\x00\x00\xD7\xD9\x97\x8Bvideo.k-wreview.ca,1378042601904.6dc6290df1855b319f60bf89faa3da41.
state=OPEN, ts=1388784531793, server=node8.distparser.com,60020,1388784498327
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,712 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region ec4f96b6cedd935aeba279b15d5337af from server
node8.domain.com,60020,1388784498327 but region was in the state
work_proposed,\x98\xBF\xAF\x90\x00\x00\x00\x00http://feedproxy.google.com/~r/WheatWeeds/~3/Of24fZKcpco/the-eighth-day-of-christmas.html,1378975430143.ec4f96b6cedd935aeba279b15d5337af.
state=OPEN, ts=1388784532540, server=node8.distparser.com,60020,1388784498327
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,747 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region 4f823b5de664556a89cbd86aa41cd0b0 from server
node8.distparser.com,60020,1388784498327 but region was in the state
work_proposed,\x8D4K\xEA\x00\x00\x00\x00http://twitter.com/home?status=CartoonStock%3A++http%3A%2F%2Fwww%2Ecartoonstock%2Ecom%2Fdirectory%2Fc%2Fcream%5Ftea%5Fgifts%2Easp,1378681682935.4f823b5de664556a89cbd86aa41cd0b0.
state=OPEN, ts=1388784532552, server=node8.distparser.com,60020,1388784498327
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:53,244 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region da0bd0a6b7187f731fb34d4ac14ca279 from server
node8.domain.com,60020,1388784498327 but region was in the state
work_proposed,\xB2\xE6\xB6\xBB\x00\x00\x00\x00http://www.canpages.ca/page/QC/notre-dame-des-prairies/concept-beton-design/4550984.html,1378737981443.da0bd0a6b7187f731fb34d4ac14ca279.
state=OPEN, ts=1388784533203, server=node8.distparser.com,60020,1388784498327
and not in expected OFFLINE, PENDING_OPEN or OPENING states
{code}
But everything finally got assigned without any restart required, any pretty
quickly.
Logs from the last run:
{code}
2014-01-03 16:32:20,252 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow): {"processingtimems":10969,"call":"balance(), rpc version=1,
client version=29,
methodsFingerPrint=1886733559","client":"192.168.23.7:54614","starttimems":1388784729247,"queuetimems":0,"class":"HMaster","responsesize":0,"method":"balance"}
2014-01-03 16:32:21,278 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region 043d45cada6185d86e743754957e579a from server
node1.distparser.com,60020,1388784692832 but region was in the state
page,moc.yubreffotseb.www\x1Fhttp\x1F-1\x1F/camera-bags-cases-straps-camera-bags-cases-c-282_888_580.html\x1Fzenid=ji3nr2ps8rnbaa7joc0lv4qln2,1388782516646.043d45cada6185d86e743754957e579a.
state=OPEN, ts=1388784735731, server=node1.distparser.com,60020,1388784692832
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:21,713 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region e097f829b6eafb70b30c254fd4af662c from server
node1.distparser.com,60020,1388784692832 but region was in the state
page,ac.usneeuq.ssenisub\x1Fhttp\x1F-1\x1F/grad_studies/PHD/about_us/queens_leaders_forum/about_us/about_us/grad_studies/PHD/student_career_services/queens_leaders_forum/recruiting/news/recruiting/about_us/accreditations.php\x1Fnull,1383168138496.e097f829b6eafb70b30c254fd4af662c.
state=OPEN, ts=1388784736528, server=node1.distparser.com,60020,1388784692832
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:26,862 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region c94705f4a7c23a0a05a01bfe9d7755bc from server
node1.distparser.com,60020,1388784692832 but region was in the state
entry,christian_labelle,1377000858428.c94705f4a7c23a0a05a01bfe9d7755bc.
state=OPEN, ts=1388784740724, server=node1.distparser.com,60020,1388784692832
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:34,516 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region a8d60db86bd03cfbfba0bae0bb3cb564 from server
node1.distparser.com,60020,1388784692832 but region was in the state
work_proposed,W{8\xB2\x00\x00\x00\x00http://www.prairiesouth.ca/williamgrayson/calendar-mainmenu-26/day.listevents/2013/10/29/23.html,1383415634227.a8d60db86bd03cfbfba0bae0bb3cb564.
state=OPEN, ts=1388784754264, server=node1.distparser.com,60020,1388784692832
and not in expected OFFLINE, PENDING_OPEN or OPENING states
{code}
So overall, it's WAY more stable! I have not been able to get anything stuck or
crashed with the 2 patchs applied. I will keep them ;) Big +1 from me. Thanks
for fixing that. I think it might be easy for fix the last few remaining
warnings...
> [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to
> OFFLINE
> ----------------------------------------------------------------------------------
>
> Key: HBASE-8912
> URL: https://issues.apache.org/jira/browse/HBASE-8912
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Lars Hofhansl
> Priority: Critical
> Fix For: 0.94.16
>
> Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt,
> HBASE-8912.patch, HBase-0.94 #1036 test - testRetrying [Jenkins].html,
> log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt
>
>
> AM throws this exception which subsequently causes the master to abort:
> {code}
> java.lang.IllegalStateException: Unexpected state :
> testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b.
> state=PENDING_OPEN, ts=1372891751912,
> server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE.
> at
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
> at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> This exception trace is from the failing test TestMetaReaderEditor which is
> failing pretty frequently, but looking at the test code, I think this is not
> a test-only issue, but affects the main code path.
> https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)