[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614232#comment-16614232
 ] 

Duo Zhang commented on HBASE-20952:
-----------------------------------

The design doc does not help, it is just like pseudo-code. What I want to know 
is that, how do we deal with several key problems if we want to remove the 
direct dependency on FileSystem. There is a simple list that comes immediately 
to my mind:

1. How do we do fencing when RS crashes? Now we need to rename the wal 
directory for a RS, and then call recoverLease for all the files to confirm 
that they are all closed. And at RS side, when creating a wal write, we use 
createNonRecursive intentionally, so that if the wal directory has been 
renamed, we can not create wal writers any more. How do we want to abstract 
these operations in the new WAL API? How does other log systems, such as ratis, 
deal with this?

2. For sync replication, we have a config called remote wal directory, which 
exposes the file system to user. As it is implemented by us at Xiaomi, we can 
help to find a work around on this. And the sync replication also replies on 
the rename operation to do fencing.

3. The replication related stuffs. I have been asking this from long long ago, 
but no one gives an overall solution. And looking at the code on the RB, we 
have already started to change the stuffs in replication? And for 
RecoveredReplicationSource, we make it abstract and introduce a new 
FSRecoveredReplicationSource? Then where is the FSReplicationSource?

I always say, we should have an overall solution first, i.e., we should know 
what the system looks like when we finish. Then we start to work things out.

Thanks.

> Re-visit the WAL API
> --------------------
>
>                 Key: HBASE-20952
>                 URL: https://issues.apache.org/jira/browse/HBASE-20952
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: Josh Elser
>            Priority: Major
>         Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to