[
https://issues.apache.org/jira/browse/FLINK-25330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464637#comment-17464637
]
Jing Ge commented on FLINK-25330:
---------------------------------
Hi [~Ibson]
Thanks for providing the information. Setting 'KEEP_DELETED_CELLS => true'
will flush all changes to the HFile and let deleted records survive during the
major compaction. But, Get and Scan will not see the deleted records. please
refer to [https://hbase.apache.org/book.html#cf.keep.deleted] for further
information.
Since HBase is used as a dim table, have you used [temporal
joins|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#temporal-joins]
to get the row from HBase as a dim table? Could it solve you problem?
How about the connector options sink.buffer-flush.xxx? Have you give it a shot?
> Flink SQL doesn't retract all versions of Hbase data
> ----------------------------------------------------
>
> Key: FLINK-25330
> URL: https://issues.apache.org/jira/browse/FLINK-25330
> Project: Flink
> Issue Type: Bug
> Components: Connectors / HBase
> Reporter: Bruce Wong
> Assignee: Jing Ge
> Priority: Critical
> Labels: pull-request-available
> Attachments: Flink-SQL-Test.zip, bundle_data.zip,
> image-2021-12-15-20-05-18-236.png, test_res.png, test_res_1.png
>
>
> h2. Background
> When we use CDC to synchronize mysql data to HBase, we find that HBase
> deletes only the last version of the specified rowkey when deleting mysql
> data. The data of the old version still exists. You end up using the wrong
> data. And I think its a bug of HBase connector.
> The following figure shows Hbase data changes before and after mysql data is
> deleted.
> !image-2021-12-15-20-05-18-236.png|width=910,height=669!
>
> h2.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)