[ 
https://issues.apache.org/jira/browse/PHOENIX-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236488#comment-17236488
 ] 

Chinmay Kulkarni edited comment on PHOENIX-6086 at 11/20/20, 10:42 PM:
-----------------------------------------------------------------------

I created this Jira to extend safety during upgrades for all SYSTEM tables, so 
that we now would take a snapshot of each one. I just realized that we also 
then extended the restore-snapshot logic to all SYSTEM tables in case of an 
exception during EXECUTE UPGRADE (see 
[this|https://github.com/apache/phoenix/blob/0d0e86e7ba63d20e92d4e8c03259344a958dbcd1/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3940-L3944]).
 Thinking about this a little bit, there can be various downsides to 
automatically restoring from the SYSTEM table snapshot, such as:

# Any DDLs issued since the upgrade began would be lost when we restore the 
snapshot of SYSTEM.CATALOG
# If we encounter an issue when upgrading any SYSTEM table, we would end up 
restoring the snapshots of all of them (I don't think there is necessarily a 
better way to handle this since we can't just restore the snapshot for the 
table whose upgrade failed or we'd be in a weird mixed metadata state)
# The point above also means that SYSTEM.SEQUENCE would be restored and that 
would break(?) sequences issued during this time.

I wanted to get your thoughts on how to handle this [~gjacoby] [~kadir] 
[~jisaac] [~vjasani] [~yanxinyi] [~sukumaddineni]. Since we currently don't log 
DDLs issued during the upgrade path and because of the problem with sequences I 
think for now, maybe it is safer to just keep the snapshots around and allow 
the operator to decide how to handle the upgrade failure rather than blindly 
forcing a restore from snapshots. What do you guys think?


was (Author: ckulkarni):
[~vjasani] I created this Jira to extend safety during upgrades for all SYSTEM 
tables, so that we now would take a snapshot of each one. I just realized that 
we also then extended the restore-snapshot logic to all SYSTEM tables in case 
of an exception during EXECUTE UPGRADE (see 
[this|https://github.com/apache/phoenix/blob/0d0e86e7ba63d20e92d4e8c03259344a958dbcd1/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3940-L3944]).
 Thinking about this a little bit, there can be various downsides to 
automatically restoring from the SYSTEM table snapshot, such as:

# Any DDLs issued since the upgrade began would be lost when we restore the 
snapshot of SYSTEM.CATALOG
# If we encounter an issue when upgrading any SYSTEM table, we would end up 
restoring the snapshots of all of them (I don't think there is necessarily a 
better way to handle this since we can't just restore the snapshot for the 
table whose upgrade failed or we'd be in a weird mixed metadata state)
# The point above also means that SYSTEM.SEQUENCE would be restored and that 
would break(?) sequences issued during this time.

I wanted to get your thoughts on how to handle this [~gjacoby] [~kadir] 
[~jisaac]. Since we currently don't log DDLs issued during the upgrade path and 
because of the problem with sequences I think for now, maybe it is safer to 
just keep the snapshots around and allow the operator to decide how to handle 
the upgrade failure rather than blindly forcing a restore from snapshots. What 
do you guys think?

> Take a snapshot of all SYSTEM tables before attempting to upgrade them
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-6086
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6086
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Assignee: Viraj Jasani
>            Priority: Critical
>             Fix For: 5.1.0, 4.16.0
>
>         Attachments: PHOENIX-6086.4.x.000.patch, 
> PHOENIX-6086.master.000.patch, PHOENIX-6086.master.002.patch, 
> PHOENIX-6086.master.003.patch
>
>
> Currently we only take a snapshot of SYSTEM.CATALOG before attempting to 
> upgrade it (see 
> [this|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3718]).
>  From 4.15 onwards we also store critical metadata information in other 
> SYSTEM tables like SYSTEM.CHILD_LINK, so it is beneficial to also snapshot 
> those tables before upgrading them henceforth.
> We also currently don't take a snapshot of SYSTEM.CATALOG on receiving an 
> [UpgradeRequiredException|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3685-L3707]
>  which we should do.
> In case of any errors during the upgrade, we restore SYSTEM.CATALOG from this 
> snapshot and we should extend this to all tables. In cases where the table 
> didn't exist before the upgrade, we need to ensure it is dropped so that a 
> subsequent upgrade attempt can start afresh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to