[jira] [Commented] (FLINK-20955) Refactor HBase Source in accordance with FLIP-27

ZhuoYu Chen (Jira) Mon, 20 Sep 2021 18:13:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417878#comment-17417878
 ]


ZhuoYu Chen commented on FLINK-20955:
-------------------------------------

{color:#333333}Hello,[~moritz.manner] , I am very interested in this, because I 
have realized Mongo data based on FLIP-27 in my work and have gained certain 
experience. I hope you can assign this task to me{color}.

> Refactor HBase Source in accordance with FLIP-27
> ------------------------------------------------
>
>                 Key: FLINK-20955
>                 URL: https://issues.apache.org/jira/browse/FLINK-20955
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / HBase, Table SQL / Ecosystem
>            Reporter: Moritz Manner
>            Priority: Minor
>              Labels: auto-deprioritized-major, auto-unassigned, 
> pull-request-available
>             Fix For: 1.14.0
>
>
> The HBase connector source implementation should be updated in accordance 
> with [FLIP-27: Refactor Source 
> Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface].
> One source should map to one table in HBase. Users can specify which 
> column[families] to watch; each change in one of the columns triggers a 
> record with change type, table, column family, column, value, and timestamp.
> h3. Idea
> The new Flink HBase Source makes use of the internal [replication mechanism 
> of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source 
> is registering at the HBase cluster and will receive all WAL edits written in 
> HBase. From those WAL edits the Source can create the DataStream. 
> h3. Split
> We're still not 100% sure which information a Split should contain. We have 
> the following possibilities: 
>  # There is only one Split per Source and the Split contains all the 
> necessary information to connect with HBase. The SourceReader which processes 
> the Split will receive all WAL edits for all tables and filters the relevant 
> edits. 
>  # There are multiple Splits per Source, each Split representing one HBase 
> Region to read from. This assumes that it is possible to only receive WAL 
> edits from a specific HBase Region and not receive all WAL edits. This would 
> be preferable as it allows parallel processing of multiple regions, but we 
> still need to figure out how this is possible.
> In both cases the Split will contain information about the HBase instance and 
> table. 
> h3. Split Enumerator
> Depending on which Split we'll decide on, the split enumerator will connect 
> to HBase and get all relevant regions or just create one Split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20955) Refactor HBase Source in accordance with FLIP-27

Reply via email to