Hi Duo,
Both replication and the backup&restore work suffer from this problem.
The approach we think will work best is that when we get to a certain
size-limit (e.g. 100MB), we will take the current Log Stream (the RAFT
quorum), flip over the RegionServer to use a new Log Stream, and then
write this to a distributed FileSystem all at once, finally cleaning up
the old Log Stream.
This approach:
* Avoids forcing us to change Replication, B&R, and other things that
are implicitly depending on a file-based WAL. We can change this later,
but are not forced to do anything immediately
* Allows replication to buffer on a filesystem as opposed to the RAFT
quorums (keeping on the FS is much much "cheaper")
I have some more on this in the detailed doc I mentioned to Stack in
another branch of the conversation. Working on making sure I can share
all of that :)
On 5/7/18 7:26 PM, 张铎(Duo Zhang) wrote:
How do we deal with replication? It is file based...
2018-05-08 10:12 GMT+08:00 Josh Elser <els...@apache.org>:
On 5/7/18 2:53 PM, Stack wrote:
On Thu, May 3, 2018 at 9:04 AM, Josh Elser <els...@apache.org> wrote:
Hi,
... I'm happy to delve some more into how I think we can implement this.
I'd be interested in this part.
St.Ack
You got it, boss. Let me find the time to get that document exported as
well. Will get back to you.
- Josh
[1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit#
[2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the
%20Cloud.pdf