[jira] [Commented] (HBASE-20952) Re-visit the WAL API

Duo Zhang (JIRA) Thu, 13 Sep 2018 20:06:33 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614292#comment-16614292
 ]


Duo Zhang commented on HBASE-20952:
-----------------------------------

API is not the first thing to decide. As I said above, the first thing is we 
need to know the overall solution. You can see our design doc for serial 
replication and sync replication

https://docs.google.com/document/d/1LHC3IRUc5i2V4_roNw8BDAOKGM4bEapR_hefpZxDT00/edit

https://docs.google.com/document/d/193D3aOxD-muPIZuQfI4Zo3_qg6-Nepeu_kraYJVQkiE/edit#heading=h.e8l9k556m3wi

There is no API design in it, but we try our best to describe how we plan to do 
it in HBase.

{quote}
This is good; I hadn't thought about abstracting out fencing. We should have 
API which pushes this fencing impl down into the Provider. For the Ratis 
LogService, we designed api to be able to close() a Log; make it read-only. In 
the context of HBase, we would close the Log before we start 
recovery/re-assignment, and have the net-effect of preventing any half-dead RS 
from continuing to try to add more edits to the Log. This effectively would 
work like recoverLease() does now for the HDFS case.
{quote}

Yes this is what I really want to discuss, not something like whether we should 
use WALInfo or WALIdentity.

The information you described is still not enough to solve all the problems. In 
the old time we will roll the wal writer, and it is done by RS, so closing the 
wal file is not enough, as the RS will try to open a new one and write to it. 
That's why we need to rename the wal directory. In your words above, it seems 
to me that we will only have one stream opened forever for a RS, then how do we 
drop the old edits after flush? And how do we setup the wal stream? Only once 
at the RS start up? And if there are errors later, we just abort? Without 
trying to recover or open a new stream? Or it will be handled by ratis? And for 
the FileSystem, we will use multi wal to increase the performance, and the 
logic is messed up with WALProvider. Does ratis still need multi wal to 
increase the performance? And if not, what's the plan? We need to refactor the 
multi wal related code, to not work against the WALProvider but something with 
the FileSystem related stuffs directly?

For the sync replication thing, it is just a DualAsyncWriter, which writes to 
two HDFS clusters at once, I think it is possible to write to other log 
systems, such as ratis, if you still share the AsyncWriter interface. The 
problem here is that how to describe the place where we write the remote wals. 
For FileSystem based wals, it is just a directory on a remote cluster, for 
example, "hdfs://cluster-name/path". We need to find a way to describe other 
log systems.

> Re-visit the WAL API
> --------------------
>
>                 Key: HBASE-20952
>                 URL: https://issues.apache.org/jira/browse/HBASE-20952
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: Josh Elser
>            Priority: Major
>         Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

Reply via email to