Carl created FLINK-23730:
----------------------------
Summary: Source from hive sink hbase lost data
Key: FLINK-23730
URL: https://issues.apache.org/jira/browse/FLINK-23730
Project: Flink
Issue Type: Bug
Components: Connectors / HBase, Connectors / Hive
Affects Versions: 1.12.1
Reporter: Carl
Our use case is as follows,
# hive source: create hive table which meta data is in HMS
# create hbase use hbase shell
# flink sql ddl: create hbase flink table
# use hive catalog: use flink sql insert into hbase flink table
if i set the tableconfig: table.exec.hive.infer-source-parallelism = false
The program will run as one parallelism,and the number of records of results is
correct.
but if i set the tableconfig: table.exec.hive.infer-source-parallelism = true
The program will run as twenty parallelism that express source parallelism is
inferred according to splits number,and the number of records of results is not
correct.
The test was repeated many times and there was no exception occurred.
So I guess it has something to do with high concurrency. Does it lose data
because of high concurrency?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)