[
https://issues.apache.org/jira/browse/FLINK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101355#comment-17101355
]
Congxian Qiu(klion26) commented on FLINK-17479:
-----------------------------------------------
[~nobleyd] thanks for reporting this problem. seems strange fro the picture
your given. if the {{checkpointMetadata}} is null, then how can the message
[C{{ould not perform checkpoint " + checkpointMetaData.getCheckpointId() + "
for operator " + getName() + '.'][1]}} could be printed? the error message will
try to get the checkpointId from {{checkpointMetaData}}. could you please share
the whole jm&tm log, a reproducible job is even better. thanks.
[1]https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L801
> Occasional checkpoint failure due to null pointer exception in Flink version
> 1.10
> ---------------------------------------------------------------------------------
>
> Key: FLINK-17479
> URL: https://issues.apache.org/jira/browse/FLINK-17479
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.10.0
> Environment: Flink1.10.0
> jdk1.8.0_60
> Reporter: nobleyd
> Priority: Major
> Attachments: image-2020-04-30-18-44-21-630.png,
> image-2020-04-30-18-55-53-779.png
>
>
> I upgrade the standalone cluster(3 machines) from flink1.9 to flink1.10.0
> latest. My job running normally in flink1.9 for about half a year, while I
> get some job failed due to null pointer exception when checkpoing in
> flink1.10.0.
> Below is the exception log:
> !image-2020-04-30-18-55-53-779.png!
> I have checked the StreamTask(882), and is shown below. I think the only case
> is that checkpointMetaData is null that can lead to a null pointer exception.
> !image-2020-04-30-18-44-21-630.png!
> I do not know why, is there anyone can help me? The problem only occurs in
> Flink1.10.0 for now, it works well in flink1.9. I give the some conf
> info(some different to the default) also in below, guessing that maybe it is
> an error for configuration mistake.
> some conf of my flink1.10.0:
>
> {code:java}
> taskmanager.memory.flink.size: 71680m
> taskmanager.memory.framework.heap.size: 512m
> taskmanager.memory.framework.off-heap.size: 512m
> taskmanager.memory.task.off-heap.size: 17920m
> taskmanager.memory.managed.size: 512m
> taskmanager.memory.jvm-metaspace.size: 512m
> taskmanager.memory.network.fraction: 0.1
> taskmanager.memory.network.min: 1024mb
> taskmanager.memory.network.max: 1536mb
> taskmanager.memory.segment-size: 128kb
> rest.port: 8682
> historyserver.web.port: 8782high-availability.jobmanager.port:
> 13141,13142,13143,13144
> blob.server.port: 13146,13147,13148,13149taskmanager.rpc.port:
> 13151,13152,13153,13154
> taskmanager.data.port: 13156metrics.internal.query-service.port:
> 13161,13162,13163,13164,13166,13167,13168,13169env.java.home:
> /usr/java/jdk1.8.0_60/bin/java
> env.pid.dir: /home/work/flink-1.10.0{code}
>
> Hope someone can help me solve it.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)