Re: Removing the WSL runners

Benjamin Schubert Sat, 22 Aug 2020 01:10:02 -0700

Hey,

responses inline!


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 21 August 2020 22:53, Sander Striker <[email protected]> wrote:

> +1 on retiring WSL1 runners. Do you anticipate any unique behavior in
> WSL2 that we should take into account?

The only unique behavior I could see is if users have their buildstream 
elements / cache on the Windows filesystem.

However:
1) The sharing is done through the samba protocol. We could test for network 
shares if we really need
2) This is a setup discouraged by Microsoft, their stand is that you should put 
the data on your WSL system.

So all in all I think we do not need to cater for this specific case.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 21 August 2020 23:57, Chandan Singh <[email protected]> wrote:

> Hi Ben,
>
> I'll let you decide what to do with WSL as I don't have enough context
> there :) But the plan generally seems good to me.
>

Seems good!

> > I believe that this is not enough a reason to keep WSL1 tests, and that, 
> > when we will have moved to GitHub actions, we should be able to have Mac 
> > tests instead, which would bring better value.
>
> This is a bit of a sidetrack, but I tried quickly adding a Mac test
> environment using Actions and the GitHub-provided runners. Here are my
> observations.
>
> Python 3.8 (which is the default now) uses `spawn` as the default
> multiprocessing method. This wreaks havoc with our testsuite and
> everything hangs. An example of this behavior can be seen in this job
> that ran for 50 mins without doing anything:
> https://github.com/cs-shadow/buildstream/runs/1014073454.
>
> Although `fork` is technically considered unsafe on MacOS (and
> Windows), it does work for the most part. At least more so than the
> `spawn`. So, I wonder if we should force the multiprocessing method to
> `fork` in BuildStream?
>
> Things look better on Python3.7. Here, most tests pass but about 10%
> of the tests fail where we fail to fork correctly. Here is an example
> of such a job: https://github.com/cs-shadow/buildstream/runs/1014350157.
> The failures look something like:
>
>     BuildStream exited with code -1 for invocation:
>     Program stderr was:
>     [--:--:--][ ][ main:core activity ] START Build
>     [--:--:--][ ][ main:core activity ] START Loading elements
>     objc[4443]: +[__NSCFConstantString initialize] may have been in
>     progress in another thread when fork() was called.
>     objc[4443]: +[__NSCFConstantString initialize] may have been in
>     progress in another thread when fork() was called. We cannot safely
>     call it or ignore it in the fork() child process. Crashing instead.
>     Set a breakpoint on objc_initializeAfterForkError to debug.
>     BUG: Message handling out of sync, unable to retrieve failure message
>     for element base.bst
>     [--:--:--][ecf8572c][ main:base.bst ] ERROR Internal job process
>     unexpectedly died with exit code -6
>
>
> Would any of our resident multiprocessing experts have any thoughts on
> how to handle this correctly?
>

I think the most correct solution would be to get rid of our multiprocessed 
scheduler, which is something I have tried to make work for more than two 
months now. I'd advocate for postponing that a bit, seeing how well a 
non-multiprocessed solution can go, and normally the problem should go away?

My branch with my current work is at [0]

Cheers!
Ben

[0]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1982

Re: Removing the WSL runners

Reply via email to