[ https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525360#comment-16525360 ]
stack commented on HBASE-20792: ------------------------------- Should subject have info:server rather than info:servername (Man, that is confusing.... info:server AND info:sn --- info:sn came in w/ zkless assignment, HBASE-11059...). [~elserj] You pulling back HBASE-20708? It seemed like a nice improvement but I was wary pulling such an extensive change without long-running tests. You thinking different? On last host, its been a while since I looked at it. IIRC, it was used retaining assignment across restarts. I also remember its usage/definition being nebulous. Patch looks good to me to commit but not for branch-2.0. > info:servername and info:sn inconsistent for OPEN region > -------------------------------------------------------- > > Key: HBASE-20792 > URL: https://issues.apache.org/jira/browse/HBASE-20792 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Blocker > Fix For: 3.0.0, 2.1.0, 2.0.2, 2.2.0 > > Attachments: HBASE-20792.patch, TestRegionMoveAndAbandon.java, > hbase-hbase-master-ctr-e138-1518143905142-380753-01-000004.hwx.site.log > > > Next problem we've run into after HBASE-20752 and HBASE-20708 > After a rolling restart of a cluster, we'll see situations where a collection > of regions will simply not be assigned out to the RS. I was able to reproduce > this my mimic the restart patterns our tests do internally (ignore whether > this is the best way to restart nodes for now :)). The general pattern is > this: > {code:java} > for rs in regionservers: > stop(server, rs, RS) > for master in masters: > stop(server, master, MASTER) > sleep(15) > for master in masters: > start(server, master, MASTER) > for rs in regionservers: > start(server, rs, RS){code} > Looking at meta, we can see why the Master is ignoring some regions: > {noformat} > test > column=table:state, timestamp=1529871718998, value=\x08\x00 > test,,1529871718122.0297f680df6dc0166a44f9536346268e. > column=info:regioninfo, timestamp=1529967103390, value={ENCODED => > 0297f680df6dc0166a44f9536346268e, NAME => > 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY > => '', ENDKEY => > ''} > test,,1529871718122.0297f680df6dc0166a44f9536346268e. > column=info:seqnumDuringOpen, timestamp=1529967103390, > value=\x00\x00\x00\x00\x00\x00\x00* > test,,1529871718122.0297f680df6dc0166a44f9536346268e. > column=info:server, timestamp=1529967103390, > value=ctr-e138-1518143905142-378097-02-000012.hwx.site:16020 > test,,1529871718122.0297f680df6dc0166a44f9536346268e. > column=info:serverstartcode, timestamp=1529967103390, value=1529966776248 > test,,1529871718122.0297f680df6dc0166a44f9536346268e. column=info:sn, > timestamp=1529967096482, > value=ctr-e138-1518143905142-378097-02-000006.hwx.site,16020,1529966755170 > test,,1529871718122.0297f680df6dc0166a44f9536346268e. > column=info:state, timestamp=1529967103390, value=OPEN{noformat} > The region is marked as {{OPEN}}. The master doesn't know any better. > However, the interesting bit is that {{info:server}} and {{info:sn}} are > inconsistent (which, according to the javadoc should not be possible for an > {{OPEN}} region).{{}} > This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd > attempt, so I'm hopeful it's not a bear to repro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)