[
https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803478#comment-17803478
]
Piotr Nowojski edited comment on FLINK-33856 at 1/5/24 9:56 AM:
----------------------------------------------------------------
{quote}
Maybe a new flip that supports task-level trace reporter can builded ? I’m
willing to participate in the development.
{quote}
Please again check the FLIP-384 discussions. I was highlighting there a couple
of difficulties:
{quote}
However, if we would like to create true distributed traces, with spans
reported from many different
components, potentially both on JM and TM, the problem is a bit deeper. The
issue in that case is how
to actually fill out `parrent_id` and `trace_id`? Passing some context
entity as a java object would be
unfeasible. That would require too many changes in too many places. I think
the only realistic way
to do it, would be to have a deterministic generator of `parten_id` and
`trace_id` values.
For example we could create the parent trace/span of the checkpoint on JM,
and set those ids to
something like: `jobId#attemptId#checkpointId`. Each subtask then could
re-generate those ids
and subtasks' checkpoint span would have an id of
`jobId#attemptId#checkpointId#subTaskId`.
Note that this is just an example, as most likely distributed spans for
checkpointing do not make
sense, as we can generate them much easier on the JM anyway.
{quote}
https://lists.apache.org/thread/7lql5f5q1np68fw1wc9trq3d9l2ox8f4
At the same time:
{quote}
I am worried that a large amount of data aggregation to JM may have
performance problems.
{quote}
I wouldn't worry about that too much. This data is already aggregated on the JM
from all of the TMs via {{CheckpointMetricsBuilder}} and {{CheckpointMetrics}}.
Besides, it's just a single RPC from subtask -> JM per checkpoint. If that
becomes a problem, we would have problems in many different areas as well (for
example {{notifyCheckpointCompleted}} is a very similar call but the other
direction).
Also AFAIR there are/were different ideas how to solve this potential
bottleneck in a more generic way (having multiple job coordinators in the
cluster to spread the load).
was (Author: pnowojski):
{quote}
Maybe a new flip that supports task-level trace reporter can builded ? I’m
willing to participate in the development.
{quote}
Please again check the FLIP-384 discussions. I was highlighting there a couple
of difficulties:
{quote}
However, if we would like to create true distributed traces, with spans
reported from many different
components, potentially both on JM and TM, the problem is a bit deeper. The
issue in that case is how
to actually fill out `parrent_id` and `trace_id`? Passing some context
entity as a java object would be
unfeasible. That would require too many changes in too many places. I think
the only realistic way
to do it, would be to have a deterministic generator of `parten_id` and
`trace_id` values.
For example we could create the parent trace/span of the checkpoint on JM,
and set those ids to
something like: `jobId#attemptId#checkpointId`. Each subtask then could
re-generate those ids
and subtasks' checkpoint span would have an id of
`jobId#attemptId#checkpointId#subTaskId`.
Note that this is just an example, as most likely distributed spans for
checkpointing do not make
sense, as we can generate them much easier on the JM anyway.
{quote}
https://lists.apache.org/thread/7lql5f5q1np68fw1wc9trq3d9l2ox8f4
At the same time:
{quote}
I am worried that a large amount of data aggregation to JM may have
performance problems.
{quote}
I wouldn't worry about that too much. Those data are already aggregated on the
JM from all of the TMs via {{CheckpointMetricsBuilder}} and
{{CheckpointMetrics}}. Besides, it's just a single RPC from subtask -> JM per
checkpoint. If that becomes a problem, we would have problems in many different
areas as well (for example {{notifyCheckpointCompleted}} is a very similar call
but the other direction).
Also AFAIR there are/were different ideas how to solve this potential
bottleneck in a more generic way (having multiple job coordinators in the
cluster to spread the load).
> Add metrics to monitor the interaction performance between task and external
> storage system in the process of checkpoint making
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-33856
> URL: https://issues.apache.org/jira/browse/FLINK-33856
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.18.0
> Reporter: Jufang He
> Assignee: Jufang He
> Priority: Major
> Labels: pull-request-available
>
> When Flink makes a checkpoint, the interaction performance with the external
> file system has a great impact on the overall time-consuming. Therefore, it
> is easy to observe the bottleneck point by adding performance indicators when
> the task interacts with the external file storage system. These include: the
> rate of file write , the latency to write the file, the latency to close the
> file.
> In flink side add the above metrics has the following advantages: convenient
> statistical different task E2E time-consuming; do not need to distinguish the
> type of external storage system, can be unified in the
> FsCheckpointStreamFactory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)