[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits -> DATALOSS

Duo Zhang (JIRA) Thu, 04 Jun 2015 18:13:05 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573900#comment-14573900
 ]


Duo Zhang commented on HBASE-13811:
-----------------------------------

{quote}
Rather than add a new method that does what the old getEarliestMemstoreSeqNum 
did, I changed getEarliestMemstoreSeqNum to be how the old version worked.
{quote}
Fine, I think it will work. But I still feel a little nervous to have two 
methods which have same name but different behaviors...

And I remember that, when implmenting HBASE-10201 and HBASE-12405, actually I 
wanted to return the flushedSeqId when calling startCacheFlush first. But there 
are two problems. First is getNextSequenceId method is in HRegion, not in 
FSHLog, so a simple solution is return NO_SEQ_NUM when flushing all stores and 
let HRegion call getNextSequenceId. But here comes the second problem, 
startCacheFlush may fail which means we can not start a flush, so there are 
three types of return values, 'sequenceId', 'choose a sequenceId by yourself', 
'give up flushing!'. I think it is ugly to have a '-2' or a null java.lang.Long 
to indicate a 'give up flushing' at that time so I gave up...

Maybe we could consider this solution again? getEarliestMemstoreSeqNum can be 
used everywhere but startCacheFlush is restricted in the flushing scope I think.

Thanks.

> Splitting WALs, we are filtering out too many edits -> DATALOSS
> ---------------------------------------------------------------
>
>                 Key: HBASE-13811
>                 URL: https://issues.apache.org/jira/browse/HBASE-13811
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0
>
>         Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 
> 13811.v2.branch-1.txt, 13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 
> 13811.v4.branch-1.txt, 13811.v5.branch-1.txt, 13811.v6.branch-1.txt, 
> 13811.v6.branch-1.txt, HBASE-13811-v1.testcase.patch, 
> HBASE-13811.testcase.patch
>
>
> I've been running ITBLLs against branch-1 around HBASE-13616 (move of 
> ServerShutdownHandler to pv2). I have come across an instance of dataloss. My 
> patch for HBASE-13616 was in place so can only think it the cause (but cannot 
> see how). When we split the logs, we are skipping legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits -> DATALOSS

Reply via email to