[ 
https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839085#comment-13839085
 ] 

Jonathan Hsieh commented on HBASE-10079:
----------------------------------------

In 0.96.0:
* flush: Not able to reproduce data loss 
* with kill: Not able to reproduce data loss. had an overcount of 1.
* with kill -9:  Not able to reproduce data loss. had an overcount of 1.

The overcount of 1 is likely a different bug that I think that I'll let slide.  
Likely the client thought it failed and retried, but it actually made it to the 
log and increments not being idempotent.

So the bug is somewhere between 0.96.0 and 0.96.1rc1.

> Increments lost after flush 
> ----------------------------
>
>                 Key: HBASE-10079
>                 URL: https://issues.apache.org/jira/browse/HBASE-10079
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.96.1
>            Reporter: Jonathan Hsieh
>            Priority: Blocker
>             Fix For: 0.96.1
>
>
> Testing 0.96.1rc1.
> With one process incrementing a row in a table, we increment single col.  We 
> flush or do kills/kill-9 and data is lost.  flush and kill are likely the 
> same problem (kill would flush), kill -9 may or may not have the same root 
> cause.
> 5 nodes
> hadoop 2.1.0 (a pre cdh5b1 hdfs).
> hbase 0.96.1 rc1 
> Test: 250000 increments on a single row an single col with various number of 
> client threads (IncrementBlaster).  Verify we have a count of 250000 after 
> the run (IncrementVerifier).
> Run 1: No fault injection.  5 runs.  count = 250000. on multiple runs.  
> Correctness verified.  1638 inc/s throughput.
> Run 2: flushes table with incrementing row.  count = 246875 !=250000.  
> correctness failed.  1517 inc/s throughput.  
> Run 3: kill of rs hosting incremented row.  count = 243750 != 250000. 
> Correctness failed.   1451 inc/s throughput.
> Run 4: one kill -9 of rs hosting incremented row.  246878.!= 250000.  
> Correctness failed. 1395 inc/s (including recovery)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to