[ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579623#comment-14579623
 ] 

Enis Soztutar commented on HBASE-13811:
---------------------------------------

bq. Resolving. There is still dataloss going on but have to run at larger 
scales: ITBLL 2.5B in my test runs. Will open new issue to do the subsequent 
hole-plugging.
[~Apache9], [[email protected]] I was able to reproduce the failures on my 
rig with 1.25B rows. The number of missing rows is much lower around ~80K. 

The root cause this time seems to be different, and not related to the WAL 
edits filtering. It is due to procedure based flush interrupting the flush 
request in case the procedure is cancelled from an exception elsewhere. This 
leaves the memstore snapshot intact without aborting the server. The next 
flush, then flushes the previous memstore with the current seqId (as opposed to 
seqId from the memstore snapshot). This creates an hfile with larger seqId than 
what its contents are. Previous behavior in 0.98 and 1.0 (I believe) is that 
after flush prepare and interruption / exception will cause RS abort. 

I'll create yet another issue for this since it is also different than this and 
HBASE-13853. 

> Splitting WALs, we are filtering out too many edits -> DATALOSS
> ---------------------------------------------------------------
>
>                 Key: HBASE-13811
>                 URL: https://issues.apache.org/jira/browse/HBASE-13811
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.1.0, 1.2.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0, 1.1.1
>
>         Attachments: 13811.addendum.txt, 13811.branch-1.txt, 
> 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 
> 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 
> 13811.v6.branch-1.txt, 13811.v6.branch-1.txt, 13811.v7.branch-1.txt, 
> 13811.v8.branch-1.txt, 13811.v9.branch-1.txt, HBASE-13811-branch-1.1.patch, 
> HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch, 
> startCacheFlush.diff
>
>
> I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
> ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
> patch for HBASE-13616 was in place so can only think it the cause (but cannot 
> see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to