[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed

Xing Shi (JIRA) Mon, 25 Jun 2012 07:34:44 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400505#comment-13400505
 ]


Xing Shi commented on HBASE-6195:
---------------------------------

@Ted and @ram:
This problem will simply occur when one KeyValue have same row, family, 
qualifier, timestamp and different memstoreTS.

There are losts of optimisation for memstoreTS for storage:

1. The flush will set memstoreTS to 0 not just Increment but Put, code in 
Store.internalFlushCache():
{code}
  if (kv.getMemstoreTS() <= smallestReadPoint) {
    // let us not change the original KV. It could be in the memstore
    // changing its memstoreTS could affect other threads/scanners.
    kv = kv.shallowCopy();
    kv.setMemstoreTS(0);
  }
{code}
If the versions of the same row with same TimeStamp flushed to StoreFiles, the 
get will choose the latest version by
{code}
// Negate this comparison so later edits show up first
      return -Longs.compare(left.getMemstoreTS(), right.getMemstoreTS());
{code}

Because the TimeStamps(in one millionsecond) and memstoreTSs are all the 
same(0) in StoreFiles, so we didn't know which one is the newest.

2. Besides this, in StoreFileScanner, there is an optimisation in 
HBASE-4346(code through HBASE-2856)
{code}
    if (cur.getMemstoreTS() <= readPoint) {
      cur.setMemstoreTS(0);
    }
{code}

So, even though we set memstoreTS progressively increases when 
Increment(memstoreTS will always 0) or Put, if we flushed two records(all the 
same excepts memstoreTS, sf1.row.memstoreTS < sf2.row.memstoreTS) into two 
StoreFiles. The memstoreTSs will also be set to 0, and we may got the old 
record sf1.row


3. Why I can't get all the records for different memstoreTS?
In the Scanner, the ExplicitColumnTracker will be used for tracking. And there 
are such code in ExplicitColumnTracker.checkColumn():
{code}
  //If column matches, check if it is a duplicate timestamp
  if (sameAsPreviousTS(timestamp)) {
    //If duplicate, skip this Key
    return ScanQueryMatcher.MatchCode.SKIP;
  }
{code}

So the Get returns just one result although they are different for memstoreTS.

4. How to resolve this?
There are some optimization through the memstoreTS makes the solution complex, 
I still don't find a solution for this problem and still thinking how to, may 
be remove some optimization.
                
> Increment data will be lost when the memstore is flushed
> --------------------------------------------------------
>
>                 Key: HBASE-6195
>                 URL: https://issues.apache.org/jira/browse/HBASE-6195
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Xing Shi
>            Assignee: ShiXing
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 6195-trunk-V7.patch, 6195.addendum, 
> HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, 
> HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, 
> HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch
>
>
> There are two problems in increment() now:
> First:
> I see that the timestamp(the variable now) in HRegion's Increment() is 
> generated before got the rowLock, so when there are multi-thread increment 
> the same row, although it generate earlier, it may got the lock later. 
> Because increment just store one version, so till now, the result will still 
> be right.
> When the region is flushing, these increment will read the kv from snapshot 
> and memstore with whose timestamp is larger, and write it back to memstore. 
> If the snapshot's timestamp larger than the memstore, the increment will got 
> the old data and then do the increment, it's wrong.
> Secondly:
> Also there is a risk in increment. Because it writes the memstore first and 
> then HLog, so if it writes HLog failed, the client will also read the 
> incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed

Reply via email to