[ 
https://issues.apache.org/jira/browse/PHOENIX-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199093#comment-17199093
 ] 

Lars Hofhansl commented on PHOENIX-6141:
----------------------------------------

Agreed that we do not want to rely on a transaction engine to implement basic 
Phoenix operations.

Note that the 2PC for indexes is based on read-repair, exploiting the fact that 
you can always use the main table as authoritative source of truth. The key is 
having enough information so that *any* client can complete the transaction.

Also note that here the LINK information is not contained in the SYSCAT, so 
it's a bit more tricky than that.

I suppose we can write a tentative row to the SYSCAT first, then write the 
CHILD_LINK, then update the SYSCAT row as not tentative. Now when there's any 
problem causes by a CHILD_LINK row any client seeing the CHILD_LINK row can 
complete the transaction by marking the SYSCAT row as not tentative.
All clients have ignore tentative SYSCAT entries.

Another option... Since we'd have to implement read-repair anyway, and since 
these operation are rare (are they?), we can do the read-repair directly: If 
there's a problem and the CHILD_LINK row is old enough and no matching SYSCAT 
entry exists, just delete the CHILD_LINK and proceed as if it did exist.


> Ensure consistency between SYSTEM.CATALOG and SYSTEM.CHILD_LINK
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-6141
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6141
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Priority: Blocker
>             Fix For: 4.17.0
>
>
> Before 4.15, "CREATE/DROP VIEW" was an atomic operation since we were issuing 
> batch mutations on just the 1 SYSTEM.CATALOG region. In 4.15 we introduced 
> SYSTEM.CHILD_LINK to store the parent->child links and so a CREATE VIEW is no 
> longer atomic since it consists of 2 separate RPCs  (1 to SYSTEM.CHILD_LINK 
> to add the linking row and another to SYSTEM.CATALOG to write metadata for 
> the new view). 
> If the second RPC i.e. the RPC to write metadata to SYSTEM.CATALOG fails 
> after the 1st RPC has already gone through, there will be an inconsistency 
> between both metadata tables. We will see orphan parent->child linking rows 
> in SYSTEM.CHILD_LINK in this case. This can cause the following issues:
> # ALTER TABLE calls on the base table will fail
> # DROP TABLE without CASCADE will fail
> # The upgrade path has calls like UpgradeUtil.upgradeTable() which will fail
> # Any metadata consistency checks can be thrown off
> # Unnecessary extra storage of orphan links
> The first 3 issues happen because we wrongly deduce that a base table has 
> child views due to the orphan linking rows.
> This Jira aims at trying to come up with a way to make mutations among 
> SYSTEM.CATALOG and SYSTEM.CHILD_LINK an atomic transaction. We can use a 
> 2-phase commit approach like in global indexing or also potentially explore 
> using a transaction manager. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to