Re: [MESOS-10007] random "Failed to get exit status for Command" for short-lived commands

2019-10-21 Thread Benjamin Mahler
Hi Charles, thanks for the thorough ticket and for surfacing it here for
attention, it didn't get spotted amongst the JIRA noise.

I replied on the ticket with a patch that should fix the issue, we can
discuss further in the ticket.

Ben

On Sat, Oct 19, 2019 at 7:35 AM Charles-François Natali 
wrote:

> Hi,
>
> I'm wondering if there's anything I could do to help
> https://issues.apache.org/jira/browse/MESOS-10007 move forward?
>
> Basically it's a race condition in libprocess/command executor causing
> spurious errors to be reported for short-lived tasks.
> I've got a detailed code path of the race and a repro, however I'm not
> sure what's the best way to fix it - any suggestion?
>
> Cheers,
>
> Charles
>


[MESOS-10007] random "Failed to get exit status for Command" for short-lived commands

2019-10-19 Thread Charles-François Natali
Hi,

I'm wondering if there's anything I could do to help
https://issues.apache.org/jira/browse/MESOS-10007 move forward?

Basically it's a race condition in libprocess/command executor causing
spurious errors to be reported for short-lived tasks.
I've got a detailed code path of the race and a repro, however I'm not
sure what's the best way to fix it - any suggestion?

Cheers,

Charles