[
https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn Visser updated FLINK-20955:
-----------------------------------
Fix Version/s: (was: 1.15.0)
> Refactor HBase Source in accordance with FLIP-27
> ------------------------------------------------
>
> Key: FLINK-20955
> URL: https://issues.apache.org/jira/browse/FLINK-20955
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / HBase, Table SQL / Ecosystem
> Reporter: Moritz Manner
> Priority: Minor
> Labels: auto-deprioritized-major, auto-unassigned,
> pull-request-available
>
> The HBase connector source implementation should be updated in accordance
> with [FLIP-27: Refactor Source
> Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface].
> One source should map to one table in HBase. Users can specify which
> column[families] to watch; each change in one of the columns triggers a
> record with change type, table, column family, column, value, and timestamp.
> h3. Idea
> The new Flink HBase Source makes use of the internal [replication mechanism
> of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source
> is registering at the HBase cluster and will receive all WAL edits written in
> HBase. From those WAL edits the Source can create the DataStream.
> h3. Split
> We're still not 100% sure which information a Split should contain. We have
> the following possibilities:
> # There is only one Split per Source and the Split contains all the
> necessary information to connect with HBase. The SourceReader which processes
> the Split will receive all WAL edits for all tables and filters the relevant
> edits.
> # There are multiple Splits per Source, each Split representing one HBase
> Region to read from. This assumes that it is possible to only receive WAL
> edits from a specific HBase Region and not receive all WAL edits. This would
> be preferable as it allows parallel processing of multiple regions, but we
> still need to figure out how this is possible.
> In both cases the Split will contain information about the HBase instance and
> table.
> h3. Split Enumerator
> Depending on which Split we'll decide on, the split enumerator will connect
> to HBase and get all relevant regions or just create one Split.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)