[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614257#comment-16614257
 ] 

Josh Elser commented on HBASE-20952:
------------------------------------

Good questions! Thanks for taking the time to write them, Duo.
{quote}How do we do fencing when RS crashes? Now we need to rename the wal 
directory for a RS, and then call recoverLease for all the files to confirm 
that they are all closed. And at RS side, when creating a wal write, we use 
createNonRecursive intentionally, so that if the wal directory has been 
renamed, we can not create wal writers any more. How do we want to abstract 
these operations in the new WAL API? How does other log systems, such as ratis, 
deal with this?
{quote}
This is good; I hadn't thought about abstracting out fencing. We should have 
API which pushes this fencing impl down into the Provider. For the Ratis 
LogService, we designed api to be able to {{close()}} a Log; make it read-only. 
In the context of HBase, we would close the Log before we start 
recovery/re-assignment, and have the net-effect of preventing any half-dead RS 
from continuing to try to add more edits to the Log. This effectively would 
work like recoverLease() does now for the HDFS case.
{quote}For sync replication, we have a config called remote wal directory, 
which exposes the file system to user. As it is implemented by us at Xiaomi, we 
can help to find a work around on this.
{quote}
Ok. I'm definitely dense here :). Do you have a pointer to some code to look 
at? Or, based on my previous, is a solution obvious to you?
{quote}looking at the code on the RB, we have already started to change the 
stuffs in replication? And for RecoveredReplicationSource, we make it abstract 
and introduce a new FSRecoveredReplicationSource? Then where is the 
FSReplicationSource?
{quote}
There is a second RB open which has a much-reduced version of that original 
patch. Looks like this might not have gotten attached to this Jira issue (oops, 
will make sure that's linked).

[https://reviews.apache.org/r/68672]

This should help give a much smaller view of API only. Trying to make some of 
the other "systems" using WALs work with a new API was a good exercise to make 
sure we didn't miss something obvious. Totally in agreement that we want a good 
API before we start throwing out implementation.

> Re-visit the WAL API
> --------------------
>
>                 Key: HBASE-20952
>                 URL: https://issues.apache.org/jira/browse/HBASE-20952
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: Josh Elser
>            Priority: Major
>         Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to