[jira] [Commented] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart

Allan Yang (JIRA) Thu, 14 Jun 2018 21:03:11 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513310#comment-16513310
 ]


Allan Yang commented on HBASE-20727:
------------------------------------

{quote}
should this be stored in the WAL dir instead of in the root dir?
{quote}

It is not a WAL and it doesn't require 'append' ability when writing this file, 
so I think remain in root dir is enough.

{quote}
what happens if the master goes down while writing the file? looks like we'll 
get an IOException in loadLastFlushedSequenceIds and act as though the file 
doesn't exist?
what happens if the master dies slowly while writing the file and still has it 
open when the new master takes over as active? It looks like we will get an 
IOException in loadLastFlushedSequenceIds, again when the chore tries in 
persistRegionLastFlushedSequenceIds, and eventually just write a new one once 
the chore comes around and the lease has expired?
{quote}

Yes, an IOException may be thrown in those cases. But not loading or writing 
the file won't cause any problem, it only regress the log split speed to where 
we don't have this patch. So no need to deal with those corner cases, just let 
it go.

{quote}
How about docs about this addition in the WAL recovery section of the ref guide?
What are the conditions where we'd turn this off? Can we document that as well?
{quote}
Added a comment in the new patch. Normally, we don't have to turn off this 
feature. Just provide a switch here.

> Persist FlushedSequenceId to speed up WAL split after cluster restart
> ---------------------------------------------------------------------
>
>                 Key: HBASE-20727
>                 URL: https://issues.apache.org/jira/browse/HBASE-20727
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: HBASE-20727.002.patch, HBASE-20727.003.patch, 
> HBASE-20727.patch
>
>
> We use flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion in 
> ServerManager to record the latest flushed seqids of regions and stores. So 
> during log split, we can use seqids stored in those maps to filter out the 
> edits which do not need to be replayed. But, those maps are not persisted. 
> After cluster restart or master restart, info of flushed seqids are all lost. 
> Here I offer a way to persist those info to HDFS, even if master restart, we 
> can still use those info to filter WAL edits and then to speed up replay.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart

Reply via email to