[jira] [Commented] (PHOENIX-5055) Split mutations batches probably affects correctness of index data

Geoffrey Jacoby (JIRA) Mon, 03 Dec 2018 10:31:37 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707630#comment-16707630
 ]


Geoffrey Jacoby commented on PHOENIX-5055:
------------------------------------------

Good find, [~Jaanai]. Some thoughts on this patch:

1. There's still a chance for two mutations to the same row to wind up in 
different batches. Consider the case where there are, say, 3 mutations to Row 
A, and mutations 1 and 2 combine to make up a batch size. I believe your patch 
will cut a batch after mutation 2, and mutation 3 will have a different 
timestamp. 

2. Related to this, this patch makes the function O(N^2), since you're having 
to iterate through allMutations with each iteration of the outer loop (albeit a 
decreasing section of it.) It would be possible to calculate all the 
sameRowMutationSizeBytes values in one pass in a separate loop, and then refer 
back to it during the existing allMutations loop. Doing this might also make 
fixing #1 easier. 

3. Nitpick: a more descriptive variable name than "s" would be good. 

4. We now have several outstanding issues involving index updates and 
timestamps. Might be good for us to give some thought to solving this 
holistically. [~vincentpoon] and [~vishk], fyi. (See also PHOENIX-5018). 

> Split mutations batches probably affects correctness of index data
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-5055
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5055
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.1
>            Reporter: Jaanai
>            Assignee: Jaanai
>            Priority: Critical
>             Fix For: 5.1.0
>
>         Attachments: ConcurrentTest.java, PHOENIX-5055-v4.x-HBase-1.4.patch
>
>
> In order to get more performance, we split the list of mutations into 
> multiple batches in MutationSate.  For one upsert SQL with some null values 
> that will produce two type KeyValues(Put and DeleteColumn),  These KeyValues 
> should have the same timestamp so that keep on an atomic operation for 
> corresponding the row key.
>  Found incorrect indexed data for the index tables by sqlline.
> !https://gw.alicdn.com/tfscom/TB1nSDqpxTpK1RjSZFGXXcHqFXa.png|width=665,height=400!
>  
> Running the following:
> {code:java}
> conn.createStatement().executeUpdate( "CREATE TABLE " + tableName + " (" + "A 
> VARCHAR NOT NULL PRIMARY KEY," + "B VARCHAR," + "C VARCHAR," + "D VARCHAR) 
> COLUMN_ENCODED_BYTES = 0"); 
> conn.createStatement().executeUpdate("CREATE INDEX " + indexName + " on " + 
> tableName + " (C) INCLUDE(D)"); 
> conn.createStatement().executeUpdate("UPSERT INTO " + tableName + "(A,B,C,D) 
> VALUES ('A2','B2','C2','D2')"); 
> conn.createStatement().executeUpdate("UPSERT INTO " + tableName + "(A,B,C,D) 
> VALUES ('A3','B3', 'C3', null)");
> {code}
> dump IndexMemStore:
> {code:java}
> hbase.index.covered.data.IndexMemStore(117): 
> Inserting:\x01A3/0:D/1542190446218/DeleteColumn/vlen=0/seqid=0/value= 
> phoenix.hbase.index.covered.data.IndexMemStore(133): Current kv state: 
> phoenix.hbase.index.covered.data.IndexMemStore(135): KV: 
> \x01A3/0:B/1542190446167/Put/vlen=2/seqid=5/value=B3 
> phoenix.hbase.index.covered.data.IndexMemStore(135): KV: 
> \x01A3/0:C/1542190446167/Put/vlen=2/seqid=5/value=C3 
> phoenix.hbase.index.covered.data.IndexMemStore(135): KV: 
> \x01A3/0:D/1542190446218/DeleteColumn/vlen=0/seqid=0/value= 
> phoenix.hbase.index.covered.data.IndexMemStore(135): KV: 
> \x01A3/0:_0/1542190446167/Put/vlen=1/seqid=5/value=x 
> phoenix.hbase.index.covered.data.IndexMemStore(137): ========== END MemStore 
> Dump ==================
> {code}
>  
> The DeleteColumn's timestamp larger than other mutations.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-5055) Split mutations batches probably affects correctness of index data

Reply via email to