This sounds like a bug in Flink. Could you share the logs of the cluster
(ideally with TRACE log level) with us?

Cheers,
Till

On Tue, Aug 11, 2020 at 9:49 AM Fabian Paul <fabianp...@data-artisans.com>
wrote:

> Hi Till,
>
> The problem is reproducible with a basic shell script doing the following
> operations.
>
> 1. Post request to /jobs/${JOB_ID}/savepoints with the payload
>          {"cancel-job": true,"target-directory": $(LOCATION)}
>         and store the trigger ID
>
> 2. Sleep 10 seconds
>
> 3. Get jobs/${JOB_ID}/savepoints/$(TRIGGER_ID)
>         results in a connect exception because rest endpoint is shutdown.
>
> Sorry, if I misunderstood you previous answer but I would expect that
> stopping the job
> with a savepoint is an asynchronous operation and should block the
> shutdown until
> the result is served.
> I also can confirm that the cluster is not shutdown but the rest endpoint
> is which makes
> it impossible to serve the asynchronous result.
>
> Best,
> Fabian
>
>

Reply via email to