[
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-14790:
------------------------------
Release Note:
Implement a FanOutOneBlockAsyncDFSOutput for writing WAL only, the WAL provider
which uses this class is AsyncFSWALProvider.
It is based on netty, and will write to 3 DNs at the same time
concurrently(fan-out) so generally it will lead to a lower latency. And it is
also fail-fast, the stream will become unwritable immediately after there are
any read/write errors, no pipeline recovery. You need to call recoverLease to
force close the output for this case. And it only supports to write a file with
a single block. For WAL this is a good behavior as we can always open a new
file when the old one is broken. The performance analysis in HBASE-16890 shows
that it has a better performance.
Behavior changes:
1. As now we write to 3 DNs concurrently, according to the visibility guarantee
of HDFS, the data will be available immediately when arriving at DN since all
the DNs will be considered as the last one in pipeline. This means replication
may read uncommitted data and replicate it to the remote cluster and cause data
inconsistency. HBASE-14004 is used to solve the problem.
2. There will be no sync failure. When the output is broken, we will open a new
file and write all the unacked wal entries to the new file. This means that we
may have duplicated entries in wal files. HBASE-14949 is used to solve this
problem.
was:
Implement a FanOutOneBlockAsyncDFSOutput for writing WAL only, the WAL provider
which uses this class is AsyncFSWALProvider.
It is based on netty, and will write to 3 DNs at the same time
concurrently(fan-out) so generally it will lead to a lower latency. And it is
also fail-fast, the stream will become unwritable immediately after there are
any read/write errors, no pipeline recovery. You need to call recoverLease to
force close the output for this case. For WAL this is a good behavior as we can
always open a new file when the old one is broken. The performance analysis in
HBASE-16890 shows that it has a better performance.
Behavior changes:
1. As now we write to 3 DNs concurrently, according to the visibility guarantee
of HDFS, the data will be available immediately when arriving at DN since all
the DNs will be considered as the last one in pipeline. This means replication
may read uncommitted data and replicate it to the remote cluster and cause data
inconsistency. HBASE-14004 is used to solve the problem.
2. There will be no sync failure. When the output is broken, we will open a new
file and write all the unacked wal entries to the new file. This means that we
may have duplicated entries in wal files. HBASE-14949 is used to solve this
problem.
> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0-beta-1
>
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all
> purposes. But in fact, we do not need most of the features if we only want to
> log WAL. For example, we do not need pipeline recovery since we could just
> close the old logger and open a new one. And also, we do not need to write
> multiple blocks since we could also open a new logger if the old file is too
> large.
> And the most important thing is that, it is hard to handle all the corner
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when
> using original DFSOutputStream due to its complicated logic. And the
> complicated logic also force us to use some magical tricks to increase
> performance. For example, we need to use multiple threads to call {{hflush}}
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when
> logging WAL. For correctness, and also for performance.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)