[ 
https://issues.apache.org/jira/browse/FLINK-16489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liangji updated FLINK-16489:
----------------------------
    Description: 
1. Environment

a. flink-1.9.0

b. yarn version

Hadoop 2.6.0-cdh5.5.0
 Subversion [http://github.com/cloudera/hadoop] -r 
fd21232cef7b8c1f536965897ce20f50b83ee7b2
 Compiled by jenkins on 2015-11-09T20:39Z
 Compiled with protoc 2.5.0
 From source with checksum 98e07176d1787150a6a9c087627562c
 This command was run using 
/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/hadoop-common-2.6.0-cdh5.5.0.jar

c. we enable flink checkpoint and use default configuration for flink 
checkpoint 

2. Problem repetition

a. Make AM run in node1;

b. Do NM decomission for node1

3. Problem

!image-2020-03-09-18-06-59-710.png!

We can see form the pic above, last AM saved chk-1522 at 2020-03-04 14:12:48. 
Then the second AM restarted with chk-1. But at last, we find data is not 
correct. So we restarted the application from chk-1522 manually with flink cli 
-s, then we confirm the data is right.

Do as above, we find that AM restart, but the flink job is not restart from the 
saved checkpoint.So is it normal or is there some configuration that I have not 
config?

  was:
1. Environment

a. flink-1.9.0

b. yarn version

Hadoop 2.6.0-cdh5.5.0
Subversion [http://github.com/cloudera/hadoop] -r 
fd21232cef7b8c1f536965897ce20f50b83ee7b2
Compiled by jenkins on 2015-11-09T20:39Z
Compiled with protoc 2.5.0
>From source with checksum 98e07176d1787150a6a9c087627562c
This command was run using 
/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/hadoop-common-2.6.0-cdh5.5.0.jar

c. we enable flink checkpoint and use default configuration for flink 
checkpoint 

2. Problem repetition

a. Make AM run in node1;

b. Do NM decomission for node1

3. Problem

Do as above, we find that AM restart, but the flink job is not restart from the 
saved checkpoint.So is it normal or is there some configuration that I have not 
config?


> Use flink on yarn,RM restart AM,but the flink job is not restart from the 
> saved checkpoint.
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-16489
>                 URL: https://issues.apache.org/jira/browse/FLINK-16489
>             Project: Flink
>          Issue Type: Bug
>            Reporter: liangji
>            Priority: Major
>         Attachments: image-2020-03-09-18-06-59-710.png
>
>
> 1. Environment
> a. flink-1.9.0
> b. yarn version
> Hadoop 2.6.0-cdh5.5.0
>  Subversion [http://github.com/cloudera/hadoop] -r 
> fd21232cef7b8c1f536965897ce20f50b83ee7b2
>  Compiled by jenkins on 2015-11-09T20:39Z
>  Compiled with protoc 2.5.0
>  From source with checksum 98e07176d1787150a6a9c087627562c
>  This command was run using 
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/hadoop-common-2.6.0-cdh5.5.0.jar
> c. we enable flink checkpoint and use default configuration for flink 
> checkpoint 
> 2. Problem repetition
> a. Make AM run in node1;
> b. Do NM decomission for node1
> 3. Problem
> !image-2020-03-09-18-06-59-710.png!
> We can see form the pic above, last AM saved chk-1522 at 2020-03-04 14:12:48. 
> Then the second AM restarted with chk-1. But at last, we find data is not 
> correct. So we restarted the application from chk-1522 manually with flink 
> cli -s, then we confirm the data is right.
> Do as above, we find that AM restart, but the flink job is not restart from 
> the saved checkpoint.So is it normal or is there some configuration that I 
> have not config?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to