[
https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417878#comment-17417878
]
ZhuoYu Chen commented on FLINK-20955:
-------------------------------------
{color:#333333}Hello,[~moritz.manner] , I am very interested in this, because I
have realized Mongo data based on FLIP-27 in my work and have gained certain
experience. I hope you can assign this task to me{color}.
> Refactor HBase Source in accordance with FLIP-27
> ------------------------------------------------
>
> Key: FLINK-20955
> URL: https://issues.apache.org/jira/browse/FLINK-20955
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / HBase, Table SQL / Ecosystem
> Reporter: Moritz Manner
> Priority: Minor
> Labels: auto-deprioritized-major, auto-unassigned,
> pull-request-available
> Fix For: 1.14.0
>
>
> The HBase connector source implementation should be updated in accordance
> with [FLIP-27: Refactor Source
> Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface].
> One source should map to one table in HBase. Users can specify which
> column[families] to watch; each change in one of the columns triggers a
> record with change type, table, column family, column, value, and timestamp.
> h3. Idea
> The new Flink HBase Source makes use of the internal [replication mechanism
> of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source
> is registering at the HBase cluster and will receive all WAL edits written in
> HBase. From those WAL edits the Source can create the DataStream.
> h3. Split
> We're still not 100% sure which information a Split should contain. We have
> the following possibilities:
> # There is only one Split per Source and the Split contains all the
> necessary information to connect with HBase. The SourceReader which processes
> the Split will receive all WAL edits for all tables and filters the relevant
> edits.
> # There are multiple Splits per Source, each Split representing one HBase
> Region to read from. This assumes that it is possible to only receive WAL
> edits from a specific HBase Region and not receive all WAL edits. This would
> be preferable as it allows parallel processing of multiple regions, but we
> still need to figure out how this is possible.
> In both cases the Split will contain information about the HBase instance and
> table.
> h3. Split Enumerator
> Depending on which Split we'll decide on, the split enumerator will connect
> to HBase and get all relevant regions or just create one Split.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)