This sounds like a bug in Flink. Could you share the logs of the cluster (ideally with TRACE log level) with us?
Cheers, Till On Tue, Aug 11, 2020 at 9:49 AM Fabian Paul <fabianp...@data-artisans.com> wrote: > Hi Till, > > The problem is reproducible with a basic shell script doing the following > operations. > > 1. Post request to /jobs/${JOB_ID}/savepoints with the payload > {"cancel-job": true,"target-directory": $(LOCATION)} > and store the trigger ID > > 2. Sleep 10 seconds > > 3. Get jobs/${JOB_ID}/savepoints/$(TRIGGER_ID) > results in a connect exception because rest endpoint is shutdown. > > Sorry, if I misunderstood you previous answer but I would expect that > stopping the job > with a savepoint is an asynchronous operation and should block the > shutdown until > the result is served. > I also can confirm that the cluster is not shutdown but the rest endpoint > is which makes > it impossible to serve the asynchronous result. > > Best, > Fabian > >