[jira] [Created] (FLINK-20955) Refactor HBase Source in accordance with FLIP-27

Moritz Manner (Jira) Wed, 13 Jan 2021 01:31:05 -0800

Moritz Manner created FLINK-20955:
-------------------------------------

             Summary: Refactor HBase Source in accordance with FLIP-27
                 Key: FLINK-20955
                 URL: https://issues.apache.org/jira/browse/FLINK-20955
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / HBase
            Reporter: Moritz Manner
             Fix For: 1.12.0

The HBase connector source implementation should be updated in accordance with
[FLIP-27: Refactor Source
Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface].

One source should map to one table in HBase. Users can specify which
column[families] to watch; each change in one of the columns triggers a record
with change type, table, column family, column, value, and timestamp.
h3. Idea

The new Flink HBase Source makes use of the internal [replication mechanism of
HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source is
registering at the HBase cluster and will receive all WAL edits written in
HBase. From those WAL edits the Source can create the DataStream.
h3. Split

We're still not 100% sure which information a Split should contain. We have the
following possibilities:
# There is only one Split per Source and the Split contains all the necessary
information to connect with HBase. The SourceReader which processes the Split
will receive all WAL edits for all tables and filters the relevant edits.
# There are multiple Splits per Source, each Split representing one HBase
Region to read from. This assumes that it is possible to only receive WAL edits
from a specific HBase Region and not receive all WAL edits. This would be
preferable as it allows parallel processing of multiple regions, but we still
need to figure out how this is possible.

In both cases the Split will contain information about the HBase instance and
table.
h3. Split Enumerator

Depending on which Split we'll decide on, the split enumerator will connect to
HBase and get all relevant regions or just create one Split.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-20955) Refactor HBase Source in accordance with FLIP-27

Reply via email to