[
https://issues.apache.org/jira/browse/FLINK-27944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu updated FLINK-27944:
----------------------------
Description:
When a task has union inputs, some IO metrics(numBytesIn* and numBuffersIn*) of
the different inputs may collide and failed to be registered.
The problem can be reproduced with a simple job like:
{code:java}
DataStream<String> source1 = env.fromElements("abc");
DataStream<String> source2 = env.fromElements("123");
source1.union(source2).print();{code}
Logs of collisions:
{code:java}
2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInLocal'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInLocal'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInRemote'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInRemote'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInLocal'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInLocal'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInRemote'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInRemote'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
{code}
was:
When a task has union inputs, some IO metrics(numBytesIn* and numBuffersIn*) of
the different inputs may collide and failed to be registered.
The problem can be reproduced with a simple job like:
{code:java}
DataStream<String> source1 = env.fromElements("abc");
DataStream<String> source2 = env.fromElements("123");
source1.union(source2).print();{code}
Logs of collisions:
{code:java}
2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInLocal'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,629 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name
'numBytesInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,629 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name 'numBytesInLocal'.
Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInRemote'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,630 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name
'numBytesInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,630 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name 'numBytesInRemote'.
Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBytesInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInLocal'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,630 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name
'numBuffersInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,630 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name 'numBuffersInLocal'.
Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInLocalPerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInRemote'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,630 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name
'numBuffersInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
Shuffle, Netty, Input]2022-06-08 00:59:01,630 WARN
org.apache.flink.metrics.MetricGroup [] - Name
collision: Group already contains a Metric with the name 'numBuffersInRemote'.
Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out,
0]2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
[] - Name collision: Group already contains a Metric with the name
'numBuffersInRemotePerSecond'. Metric will not be reported.[, taskmanager,
fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
{code}
> IO metric collision happens when a task has union inputs
> --------------------------------------------------------
>
> Key: FLINK-27944
> URL: https://issues.apache.org/jira/browse/FLINK-27944
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 1.15.0
> Reporter: Zhu Zhu
> Priority: Critical
> Fix For: 1.16.0
>
>
> When a task has union inputs, some IO metrics(numBytesIn* and numBuffersIn*)
> of the different inputs may collide and failed to be registered.
>
> The problem can be reproduced with a simple job like:
> {code:java}
> DataStream<String> source1 = env.fromElements("abc");
> DataStream<String> source2 = env.fromElements("123");
> source1.union(source2).print();{code}
>
> Logs of collisions:
> {code:java}
> 2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInLocal'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
> Shuffle, Netty, Input]
> 2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInLocalPerSecond'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
> Shuffle, Netty, Input]
> 2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInLocal'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
> 2022-06-08 00:59:01,629 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInLocalPerSecond'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInRemote'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
> Shuffle, Netty, Input]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInRemotePerSecond'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
> Shuffle, Netty, Input]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInRemote'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBytesInRemotePerSecond'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInLocal'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
> Shuffle, Netty, Input]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInLocalPerSecond'. Metric will not be reported.[,
> taskmanager, fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to
> Std. Out, 0, Shuffle, Netty, Input]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInLocal'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInLocalPerSecond'. Metric will not be reported.[,
> taskmanager, fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to
> Std. Out, 0]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInRemote'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0,
> Shuffle, Netty, Input]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInRemotePerSecond'. Metric will not be reported.[,
> taskmanager, fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to
> Std. Out, 0, Shuffle, Netty, Input]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInRemote'. Metric will not be reported.[, taskmanager,
> fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to Std. Out, 0]
> 2022-06-08 00:59:01,630 WARN org.apache.flink.metrics.MetricGroup
> [] - Name collision: Group already contains a Metric with the
> name 'numBuffersInRemotePerSecond'. Metric will not be reported.[,
> taskmanager, fa9f270e-e904-4f69-8227-8d6e26e1be62, WordCount, Sink: Print to
> Std. Out, 0]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)