[jira] [Created] (FLINK-29971) Hbase sink will lose data at extreme case

wenchao.wu (Jira) Thu, 10 Nov 2022 00:25:06 -0800

wenchao.wu created FLINK-29971:
----------------------------------

             Summary: Hbase sink will lose data at extreme case
                 Key: FLINK-29971
                 URL: https://issues.apache.org/jira/browse/FLINK-29971
             Project: Flink
          Issue Type: Bug
          Components: Connectors / HBase
    Affects Versions: 1.15.2, 1.13.6
            Reporter: wenchao.wu
         Attachments: image-2022-11-10-16-02-51-402.png, 
image-2022-11-10-16-08-23-325.png, image-2022-11-10-16-10-50-711.png, 
image-2022-11-10-16-24-01-396.png


h2. Situation:

When I use kafka as source and hbase as sink but the hbase table I didn't have 
the permission, I send data to kafka one message with a long time gap.

In this situation the normal result will be when trigger checkpoint the job 
will failed. But actually the jobs will continue to run and can trigger 
checkpoint successfully.
h2. Analysis

The hbase sink will throw exception in *checkErrorAndRethrow()* funciton. And 
this function will be called in two function, *invoke()* and {*}flush(){*}. 
Beside {*}invoke(){*}, *flush()* will be called at two place, one is 
{*}snapshot(){*}, one is in the scheduledThread as the follow snippet of code:

!image-2022-11-10-16-02-51-402.png!

We can see that in the  scheduledThread the exception throw by *flush()* will 
be catch and reset to failureThrowable.

So if there's no message come, the only way to throw the exception is in 
{*}snapshot(){*}. But the snapshot function call flush() is conditional as  the 
follow snippet of code:

!image-2022-11-10-16-08-23-325.png!

But the scheduledThread will called flush() periodically and set 
numPendingRequests as 0.

!image-2022-11-10-16-10-50-711.png!

So if no other message comes the snapshot will run successfully which means the 
checkpoint will be success but that message was not written to hbase, the 
message is loss.

 
h2. Solution

I think the reason is that when trigger checkpoint and call snapshot function, 
need to call *checkErrorAndRethrow()* first as follow: 

!image-2022-11-10-16-24-01-396.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-29971) Hbase sink will lose data at extreme case

Reply via email to