[
https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322866#comment-17322866
]
Flink Jira Bot commented on FLINK-20955:
----------------------------------------
This issue is assigned but has not received an update in 7 days so it has been
labeled "stale-assigned". If you are still working on the issue, please give an
update and remove the label. If you are no longer working on the issue, please
unassign so someone else may work on it. In 7 days the issue will be
automatically unassigned.
> Refactor HBase Source in accordance with FLIP-27
> ------------------------------------------------
>
> Key: FLINK-20955
> URL: https://issues.apache.org/jira/browse/FLINK-20955
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / HBase, Table SQL / Ecosystem
> Reporter: Moritz Manner
> Assignee: Moritz Manner
> Priority: Major
> Labels: pull-request-available, stale-assigned
> Fix For: 1.14.0
>
>
> The HBase connector source implementation should be updated in accordance
> with [FLIP-27: Refactor Source
> Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface].
> One source should map to one table in HBase. Users can specify which
> column[families] to watch; each change in one of the columns triggers a
> record with change type, table, column family, column, value, and timestamp.
> h3. Idea
> The new Flink HBase Source makes use of the internal [replication mechanism
> of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source
> is registering at the HBase cluster and will receive all WAL edits written in
> HBase. From those WAL edits the Source can create the DataStream.
> h3. Split
> We're still not 100% sure which information a Split should contain. We have
> the following possibilities:
> # There is only one Split per Source and the Split contains all the
> necessary information to connect with HBase. The SourceReader which processes
> the Split will receive all WAL edits for all tables and filters the relevant
> edits.
> # There are multiple Splits per Source, each Split representing one HBase
> Region to read from. This assumes that it is possible to only receive WAL
> edits from a specific HBase Region and not receive all WAL edits. This would
> be preferable as it allows parallel processing of multiple regions, but we
> still need to figure out how this is possible.
> In both cases the Split will contain information about the HBase instance and
> table.
> h3. Split Enumerator
> Depending on which Split we'll decide on, the split enumerator will connect
> to HBase and get all relevant regions or just create one Split.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)