[
https://issues.apache.org/jira/browse/PHOENIX-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199093#comment-17199093
]
Lars Hofhansl commented on PHOENIX-6141:
----------------------------------------
Agreed that we do not want to rely on a transaction engine to implement basic
Phoenix operations.
Note that the 2PC for indexes is based on read-repair, exploiting the fact that
you can always use the main table as authoritative source of truth. The key is
having enough information so that *any* client can complete the transaction.
Also note that here the LINK information is not contained in the SYSCAT, so
it's a bit more tricky than that.
I suppose we can write a tentative row to the SYSCAT first, then write the
CHILD_LINK, then update the SYSCAT row as not tentative. Now when there's any
problem causes by a CHILD_LINK row any client seeing the CHILD_LINK row can
complete the transaction by marking the SYSCAT row as not tentative.
All clients have ignore tentative SYSCAT entries.
Another option... Since we'd have to implement read-repair anyway, and since
these operation are rare (are they?), we can do the read-repair directly: If
there's a problem and the CHILD_LINK row is old enough and no matching SYSCAT
entry exists, just delete the CHILD_LINK and proceed as if it did exist.
> Ensure consistency between SYSTEM.CATALOG and SYSTEM.CHILD_LINK
> ---------------------------------------------------------------
>
> Key: PHOENIX-6141
> URL: https://issues.apache.org/jira/browse/PHOENIX-6141
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.15.0
> Reporter: Chinmay Kulkarni
> Priority: Blocker
> Fix For: 4.17.0
>
>
> Before 4.15, "CREATE/DROP VIEW" was an atomic operation since we were issuing
> batch mutations on just the 1 SYSTEM.CATALOG region. In 4.15 we introduced
> SYSTEM.CHILD_LINK to store the parent->child links and so a CREATE VIEW is no
> longer atomic since it consists of 2 separate RPCs (1 to SYSTEM.CHILD_LINK
> to add the linking row and another to SYSTEM.CATALOG to write metadata for
> the new view).
> If the second RPC i.e. the RPC to write metadata to SYSTEM.CATALOG fails
> after the 1st RPC has already gone through, there will be an inconsistency
> between both metadata tables. We will see orphan parent->child linking rows
> in SYSTEM.CHILD_LINK in this case. This can cause the following issues:
> # ALTER TABLE calls on the base table will fail
> # DROP TABLE without CASCADE will fail
> # The upgrade path has calls like UpgradeUtil.upgradeTable() which will fail
> # Any metadata consistency checks can be thrown off
> # Unnecessary extra storage of orphan links
> The first 3 issues happen because we wrongly deduce that a base table has
> child views due to the orphan linking rows.
> This Jira aims at trying to come up with a way to make mutations among
> SYSTEM.CATALOG and SYSTEM.CHILD_LINK an atomic transaction. We can use a
> 2-phase commit approach like in global indexing or also potentially explore
> using a transaction manager.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)