[
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615265#comment-16615265
]
Sergey Soldatov commented on HBASE-20952:
-----------------------------------------
{quote}In the old time we will roll the wal writer, and it is done by RS, so
closing the wal file is not enough, as the RS will try to open a new one and
write to it. That's why we need to rename the wal directory.
{quote}
For the WAL provider that doesn't depend on the HDFS directory structure, there
should be a manager that keeps information about existing logs. Internals are
implementation specific (i.e. for Kafka wal provider it may be a separate topic
or some internal DB. For consensus-based logs like Ratis LogService that might
be a separate state machine), But any new log should be registered there.
Adding a new method to WALProvider like 'disable'/'decommission' that would
tell the manager to reject new logs for particular RS (or even region if we
consider wal per region schema) is not a problem. For the existing wal
providers, that method may rename the wal directory.
{quote}In your words above, it seems to me that we will only have one stream
opened forever for a RS, then how do we drop the old edits after flush? And how
do we setup the wal stream? Only once at the RS start up? And if there are
errors later, we just abort?
{quote}
Not necessary. There is no problem to have wal per region. Actually, in some
cases, it would be preferable. For example Kafka topic per region. Any kind of
recovery would be a simple subscribe/replay the particular topic. No log
splits, less offline time. For a regular case, we are not talking about
streams. It's just a WAL implementation that supports the append operation. For
replication/recovery we should be able to get a stream and read from the
particular ID/offset. Error handling should be hidden by the implementation. A
simple example for quorum based implementation. We have 3 nodes quorum for log
'RS1.1' (RS1, RS2,RS3). RS2, RS3 went down due some reason, so we lost the
majority and this quorum becomes read-only. A new log 'RS1.2' is created with
the quorum (RS1, RS4, RS5) and all writes are going there. But if we speak
about reading stream it would provide a single instance that iterates through
RS1.1 and RS1.2 continuously. The same approach may be applied to the existing
wal files as well.
{quote}And for the FileSystem, we will use multi wal to increase the
performance, and the logic is messed up with WALProvider. Does ratis still need
multi wal to increase the performance? And if not, what's the plan? We need to
refactor the multi wal related code, to not work against the WALProvider but
something with the FileSystem related stuffs directly?
{quote}
That might be done in the further refactoring of multiwal. At the moment the
approach is that we may specify 3rd party wal provider class in WALFactory. So
if it's there, multiwal would not be used at all as it's the provider class. In
other hands, it could be refactored to something like 'wal strategy' and works
with any kind of providers.
{quote}had mentioned offline yesterday that he thinks some gaps still exist
around WAL splitting – do you understand that well enough to suggest what needs
to be addressed in the doc which is not already there?
{quote}
WALSplitter is a separate topic for the discussion. The current implementation
has a bunch of dependencies on file operations such as temporary files, list of
corrupted files, etc. From HBase perspective, it would be much easier to keep
it as is and make log splitter an interface that should take log and create
list of recovery logs. But from the perspective of 3rd party wal developer that
would be a nightmare to handle all possible cases and fit into the split log
chore logic. In other hands for the 1st iteration, this may be hidden by the
schema where 3rd party wal may not use the splitter at all and recovery would
be reading a stream of records provided by the WALProvider for a particular
region
> Re-visit the WAL API
> --------------------
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
> Issue Type: Sub-task
> Components: wal
> Reporter: Josh Elser
> Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an
> HBase WAL API should look like. What are the primitive calls that we require
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We
> should also have a mind for what is happening in the Ratis LogService (but
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We
> should make sure all consumers are generally going to be OK with the API we
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods
> which were "bolted" on such as {{AbstractFSWAL}} and
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability
> annotations are chosen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)