[
https://issues.apache.org/jira/browse/PHOENIX-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15521594#comment-15521594
]
James Taylor commented on PHOENIX-3326:
---------------------------------------
We can't put the cell we're using for the lock on the SYSTEM.CATALOG for the
reasons already mentioned here:
https://issues.apache.org/jira/browse/PHOENIX-3326?focusedCommentId=15519226&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15519226
I definitely wouldn't want to introduce a new dependency. I think we can leave
the RC as it is and fix in 4.9. It's not causing any harm.
How about if we did the mutex using a new coprocessor method on our
MetaDataEndpoint coprocessor which is installed on SYSTEM.CATALOG? We could
likely even do that without involving zk. Maybe a row lock on the row in the
SYSTEM.CATALOG representing the SYSTEM.CATALOG? We could make this change in
the 4.x branches.
> Restoring SYSTEM.CATALOG from snapshot causes clients to run into
> UpgradeInProgressException
> --------------------------------------------------------------------------------------------
>
> Key: PHOENIX-3326
> URL: https://issues.apache.org/jira/browse/PHOENIX-3326
> Project: Phoenix
> Issue Type: Bug
> Reporter: Samarth Jain
> Assignee: Samarth Jain
> Attachments: PHOENIX-3326_4.8-HBase-0.98.patch,
> PHOENIX-3326_4.8-HBase-0.98_v2.patch, PHOENIX-3326_wip.patch
>
>
> We create a snapshot of the SYSTEM.CATALOG table only after the client is
> able to successfully acquire a distributed mutex of sorts. This means the
> snapshot also ends up containing the row that serves as the mutex. Now when
> restoring the table from snapshot, this rows is still present which causes
> clients to throw UpgradeInProgress exception.
> I can think of a couple of ways to fix this:
> 1) Do the checkAndPut for the UPGRADE_MUTEX after creating the snapshot. I am
> not too sure though how about HBase handles concurrent snapshot requests. Do
> clients get an exception? Also we potentially could end up creating more
> snapshots than we really need to.
> 2) Do the checkAndPut for the UPGRADE_MUTEX in a different table (possibly
> SYSTEM.SEQUENCE). This way the restored snapshot won't have the row. We would
> need to delete the row from SYSTEM.SEQUENCE after the upgrade is done
> (successfully or unsuccessfully).
> [~jamestaylor] - WDYT?
> FYI, [~lhofhansl] - this is probably a blocker for 4.8.1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)