[
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185095#comment-13185095
]
Zhihong Yu commented on HBASE-5155:
-----------------------------------
In AssignmentManager.java, setEnabledTable():
{code}
+ LOG.error("Unable to ensure that the table will be"
+ + " enabled because of a ZooKeeper issue");
{code}
Please include tableName in the log.
In bulkAssignUserRegions():
{code}
+ List<HRegionInfo> regionsList = java.util.Arrays.asList(regions);
+ for (HRegionInfo regionInfo : regionsList) {
{code}
Can we directly iterate over regions array ?
In ZKTable.java:
{code}
- if (!isEnabledOrDisablingTable(tableName)) {
+ if (isEnabledOrDisablingTable(tableName)) {
LOG.warn("Moving table " + tableName + " state to disabling but was " +
"not first in enabled state: " + this.cache.get(tableName));
{code}
Why was the above change necessary ? Now the warning doesn't match the check.
I see some long line:
{code}
TEST_UTIL.createTable(TABLENAME, FAMILYNAME);
+
assertTrue(m.assignmentManager.getZKTable().isEnabledTable(Bytes.toString(TABLENAME)));
{code}
Overall, this patch looks very good.
Thanks for plugging a hole w.r.t. cache in ZkTable.
> ServerShutDownHandler And Disable/Delete should not happen parallely leading
> to recreation of regions that were deleted
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.4
> Reporter: ramkrishna.s.vasudevan
> Priority: Blocker
> Attachments: HBASE-5155_latest.patch
>
>
> ServerShutDownHandler and disable/delete table handler races. This is not an
> issue due to TM.
> -> A regionserver goes down. In our cluster the regionserver holds lot of
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
> if (hri.isOffline() && hri.isSplit()) {
> LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
> fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
> LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
> MetaEditor.addDaughter(catalogTracker, daughter, null);
> // TODO: Log WARN if the regiondir does not exist in the fs. If its not
> // there then something wonky about the split -- things will keep going
> // but could be missing references to parent region.
> // And assign it.
> assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which
> i think is more critical.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira