[jira] [Resolved] (PHOENIX-5813) Index read repair should not interfere with concurrent updates

Kadir OZDEMIR (Jira) Tue, 05 Jan 2021 14:34:51 -0800


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kadir OZDEMIR resolved PHOENIX-5813.
------------------------------------
    Resolution: Not A Problem

> Index read repair should not interfere with concurrent updates 
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5813
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5813
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Priority: Major
>
> Let \{1, a, x, y} be a row in the data table. Let the first column be the 
> only pk column and the second column be the only indexed column of the table, 
> and finally the forth column be the only covered column by the index for this 
> table. The corresponding row in the index table would be \{a, 1, y}. 
> Now, let the same data table row be mutated and the new state of the row be 
> \{1, b, x, y}. The index row \{a, 1, y} is not valid any more in the index 
> table and needs to be deleted. Thus, the prepared index mutations will 
> include the delete row mutation for the row key \{a, 1} and a put mutation, 
> that is, put \{b, 1, y} for the new row.  
> Let \{1, c, x, y} be another mutation on the same row that arrives before the 
> previous mutation updates the data table. This means that the prepared index 
> mutations will include the delete row mutation for the row key \{a, 1} and a 
> put mutation, that is, put \{c, 1, y}. However, the last update should have 
> deleted index row \{b, 1} instead of \{a, 1}. To prevent this, 
> IndexRegionObserver maintains a collection of data table row keys for each 
> pending data table row update in order to detect concurrent updates, and 
> skips the third write phase for them. In the first update phase, index rows 
> are made unverified and in the third update phase, they are verified or 
> deleted. The read-repair operation on these unverified rows will lead to 
> proper resolution of these concurrent updates. 
> Therefore, two or more pending updates from different batches on the same 
> data row are concurrent if and only if for all of these updates the data 
> table row state is read from HBase under a Phoenix level row lock and for 
> none of them the row lock has been acquired the second time for updating the 
> data table. In other words, all of them are in the first update phase 
> concurrently. For concurrent updates, the first two update phases are done 
> but the last update phase is skipped. This means the data table row will be 
> updated by these updates but the corresponding index table rows will be left 
> with the unverified status. Then, the read repair process will repair these 
> unverified index rows during scans.
> For the example given above, \{1, b, x, y} and \{1, c, x, y} are concurrent 
> updates (on the same data table row). As explained above, the index rows 
> generated for these updates should be left unverified. Now assume that a scan 
> on the index table detects that index row \{1, b, x, y} is unverified while 
> the concurrent updates are in progress, and the index row is repaired from 
> the data table. It is possible that the read repair gets the row \{1, b, x, 
> y} from the data table. Then it will rebuild the corresponding index row 
> which is the row \{b, 1, y} and will make the row verified. This rebuild may 
> happen just after the row \{b, 1, y} is made unverified by the concurrent 
> updates. This means that the repair will overwrite the result of the 
> concurrent updates. 
> This scan will return \{b, 1, y} to the client. Then this scan may also 
> detect that \{c, 1, y} is also unverified. By the time, this row is repaired, 
> the data table row could be \{1, c, x, y}. This means the corresponding index 
> row \{c, 1, y} will be made verified by the read repair and also returned to 
> the client for the same scan. However, only one these index rows should have 
> been returned to the client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (PHOENIX-5813) Index read repair should not interfere with concurrent updates

Reply via email to