Hi Ian,

On Thursday, June 2, 2022 at 3:03:17 AM UTC+2 Ian Lance Taylor wrote:

> On Wed, Jun 1, 2022 at 10:45 AM TheDiveO wrote: 
> > What now? Terminating T0 doesn't look like a great idea at second look: 
> for instance, as I mentioned above, this causes some problems further down 
> the road, such as things in the procfs for this process becoming 
> inacessible. 
>
> Can you point to some documentation about this problem, or show a 
> program where it causes problems.
>

This happens, unfortunately, in a closed source project. However, as I 
reuse existing OpenSource parts, I can link at least to the basic elements 
the closed project sits on top of. What happens is that after triggering 
some service handlers, these handlers fall into one of two types with 
respect to switching between different network namespaces.

One type uses runtime.LockOSThread, then switches the thread's network 
namespace, does some things, then switches back to its "saved" original 
network namespace, and finally runtime.UnlockOSThread ... so that the 
underlying thread/task can be freely reused as it is untainted (again). The 
generic namespace-switching functionality can be found here: 
https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L172

Another type uses new goroutines for its tasks which in turn do 
runtime.LockOSThread, then switch their threads's network namespaces, but 
never runtime.UnlockOSThread and immediately exit after handing over their 
results via channels to the waiting service handler. The locking and never 
unlocking method can be seen here: 
https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L144 
respectively 
https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L92.

These basic namespace-switching building blocks are instrumented with test 
cases that additionally check that all threads/tasks have been properly 
restored at the end of the tests, using https://github.com/thediveo/namspill. 
So far, these tests could not trigger the situation I see in the closed 
project. I've additionally instrumented tests for the closed product, also 
using the namspill checker in unit tests -- unfortunately, I could never 
trigger the behavior in the existing tests that gets triggered in 
production. While I can reproducible trigger the invalid situation in the 
product, I've unfortunately so far not managed to come up with a 
corresponding simplified test.

I'm not suspecting that there is a Go runtime bug but I want to understand 
what is happening in order to take appropriate measures to correctly avoid 
the situation where the Go service's initial thread/task T0 ends up with a 
switched network namespace, whereas this should never happen (=famous last 
words).
 

> > Is there a way to trick(?) a non-main goroutine onto T0 as an 
> experiment? 
>
> I'm not sure what a "non-main goroutine" is. All goroutines are 
> basically equivalent. Assuming you mean the initial goroutine, you 
> can lock that to the initial thread by calling runtime.LockOSThread in 
> an init function. That should wind up calling the main function with 
> a goroutine locked to the initial thread. Then you could, for 
> example, start a new goroutine and then let the initial goroutine 
> exit.
>

You're correct: I was thinking about the initial goroutine.

The initial goroutine also calls main, but this can happen on any 
thread/task ... so far correct now?

Since from the perspective of Linux the leader task represents a process 
and thus the process-related (and not task-related) OS-managed resources. 
Thus I would assume that terminating the leader task should be avoided: 
what is the process return code when terminating the leader task and leave 
other tasks running? The exit code of the last task standing?

Thus, I'm wondering how Go's scheduler can keep all threads/tasks fully 
symmetrical when the Linux kernel enforces a certain asymmetry upon tasks? 
Might there be a rule that is in the end causing T0 not to be terminated 
despite being locked to a non-initial goroutine and this non-initial 
goroutine terminates?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/bfed94e8-ac73-4a5f-abda-61401f3613a0n%40googlegroups.com.

Reply via email to