[jira] [Comment Edited] (FLINK-23730) Source from hive sink hbase lost data

luoyuxia (Jira) Mon, 16 Aug 2021 02:00:14 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399613#comment-17399613
 ]


luoyuxia edited comment on FLINK-23730 at 8/16/21, 8:59 AM:
------------------------------------------------------------

[~yanchenyun] Thanks for reporting it. It's strange that that the result won't 
be correct when enable infer-source-parallelism. It shouldn't lose data with a 
high concurrency.

Would you like to show the completed flink sql?


was (Author: luoyuxia):
[~yanchenyun] Thanks for reporting it. It's strange that that the result won't 
be correct when enable infer-source-parallelism. It shouldn't lose data because 
of high concurrency.

Would you like to show the completed flink sql?

> Source from hive sink hbase lost data
> -------------------------------------
>
>                 Key: FLINK-23730
>                 URL: https://issues.apache.org/jira/browse/FLINK-23730
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / HBase, Connectors / Hive
>    Affects Versions: 1.12.1
>            Reporter: Carl
>            Priority: Major
>
> Our use case is as follows,
>  # hive source: create hive table which meta data is in HMS
>  # create hbase use hbase shell
>  # flink sql ddl: create hbase flink table
>  # use hive catalog: use flink sql insert into hbase flink table
> if i set the tableconfig:  table.exec.hive.infer-source-parallelism = false
> The program will run as one parallelism，and the number of records of results 
> is correct.
> but if i set the tableconfig:  table.exec.hive.infer-source-parallelism = true
> The program will run as twenty parallelism that express source parallelism is 
> inferred according to splits number，and the number of records of results is 
> not correct.
>  
> The test was repeated many times and there was no exception occurred.
>  
> So I guess it has something to do with high concurrency. Does it lose data 
> because of high concurrency?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-23730) Source from hive sink hbase lost data

Reply via email to