[jira] [Updated] (FLINK-35911) Flink kafka-source connector

Gang Yang (Jira) Sun, 28 Jul 2024 20:42:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-35911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gang Yang updated FLINK-35911:
------------------------------
    Description: 
Problem description: Business feedback data is lost. After investigation, it is 
found that some partition data of the upstream Kafka-Source is not consumed. 
Secondly, the business has been restarted from the state many times, but this 
problem still exists.

Solution: In order to troubleshoot and locate the problem, the parameter: 
pipeline.operator-chaining = 'false' is configured later, and then the task is 
restarted in a state. It is very strange to find that the task has resumed 
normal consumption.

Business scenario: simple data synchronization kafka write hdfs

Source is defined：
{code:java}
// code placeholder
CREATE TABLE `play_log_source` (
  `appName` VARCHAR,
  `appInfo.channel` VARCHAR,
  `channel_name` AS `appInfo.channel`,
  `appInfo.packageName` VARCHAR,
  `package_name` AS `appInfo.packageName`,
  `deviceInfo.deviceId` VARCHAR,
  `device_id` AS `deviceInfo.deviceId`,
  `deviceInfo.deviceName` VARCHAR
) WITH (
  'nested-json.key.fields.deserialize-min.enabled' = 'true',
  'connector' = 'kafka',
  'format' = 'nested-json'
);{code}
 

Flink version：Flink-1.18.1

  was:
问题描述：业务反馈数据丢失，经排查发现是上游Kafka-Source部分分区数据没有被消费，其次业务也多次从状态重启，但这个问题仍然存在。

解决方式：后续为了排查定位问题配置了参数：pipeline.operator-chaining = 
'false'，然后有状态重启任务，很诡异的发现任务竟然恢复了正常消费。

业务场景：简单数据同步，kafka write hdfs

Source定义如下：
{code:java}
// code placeholder
CREATE TABLE `play_log_source` (
  `appName` VARCHAR,
  `appInfo.channel` VARCHAR,
  `channel_name` AS `appInfo.channel`,
  `appInfo.packageName` VARCHAR,
  `package_name` AS `appInfo.packageName`,
  `deviceInfo.deviceId` VARCHAR,
  `device_id` AS `deviceInfo.deviceId`,
  `deviceInfo.deviceName` VARCHAR
) WITH (
  'nested-json.key.fields.deserialize-min.enabled' = 'true',
  'connector' = 'kafka',
  'format' = 'nested-json'
);{code}
 

Flink版本：Flink-1.18.1


> Flink kafka-source connector
> ----------------------------
>
>                 Key: FLINK-35911
>                 URL: https://issues.apache.org/jira/browse/FLINK-35911
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka
>    Affects Versions: kafka-3.0.1
>            Reporter: Gang Yang
>            Priority: Major
>
> Problem description: Business feedback data is lost. After investigation, it 
> is found that some partition data of the upstream Kafka-Source is not 
> consumed. Secondly, the business has been restarted from the state many 
> times, but this problem still exists.
> Solution: In order to troubleshoot and locate the problem, the parameter: 
> pipeline.operator-chaining = 'false' is configured later, and then the task 
> is restarted in a state. It is very strange to find that the task has resumed 
> normal consumption.
> Business scenario: simple data synchronization kafka write hdfs
> Source is defined：
> {code:java}
> // code placeholder
> CREATE TABLE `play_log_source` (
>   `appName` VARCHAR,
>   `appInfo.channel` VARCHAR,
>   `channel_name` AS `appInfo.channel`,
>   `appInfo.packageName` VARCHAR,
>   `package_name` AS `appInfo.packageName`,
>   `deviceInfo.deviceId` VARCHAR,
>   `device_id` AS `deviceInfo.deviceId`,
>   `deviceInfo.deviceName` VARCHAR
> ) WITH (
>   'nested-json.key.fields.deserialize-min.enabled' = 'true',
>   'connector' = 'kafka',
>   'format' = 'nested-json'
> );{code}
>  
> Flink version：Flink-1.18.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35911) Flink kafka-source connector

Reply via email to