On Thu, Jun 2, 2022 at 6:20 AM TheDiveO <harald.albre...@gmx.net> wrote: > > On Thursday, June 2, 2022 at 3:03:17 AM UTC+2 Ian Lance Taylor wrote: >> >> On Wed, Jun 1, 2022 at 10:45 AM TheDiveO wrote: >> > What now? Terminating T0 doesn't look like a great idea at second look: >> > for instance, as I mentioned above, this causes some problems further down >> > the road, such as things in the procfs for this process becoming >> > inacessible. >> >> Can you point to some documentation about this problem, or show a >> program where it causes problems. > > > This happens, unfortunately, in a closed source project. However, as I reuse > existing OpenSource parts, I can link at least to the basic elements the > closed project sits on top of. What happens is that after triggering some > service handlers, these handlers fall into one of two types with respect to > switching between different network namespaces. > > One type uses runtime.LockOSThread, then switches the thread's network > namespace, does some things, then switches back to its "saved" original > network namespace, and finally runtime.UnlockOSThread ... so that the > underlying thread/task can be freely reused as it is untainted (again). The > generic namespace-switching functionality can be found here: > https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L172 > > Another type uses new goroutines for its tasks which in turn do > runtime.LockOSThread, then switch their threads's network namespaces, but > never runtime.UnlockOSThread and immediately exit after handing over their > results via channels to the waiting service handler. The locking and never > unlocking method can be seen here: > https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L144 > respectively > https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L92. > > These basic namespace-switching building blocks are instrumented with test > cases that additionally check that all threads/tasks have been properly > restored at the end of the tests, using https://github.com/thediveo/namspill. > So far, these tests could not trigger the situation I see in the closed > project. I've additionally instrumented tests for the closed product, also > using the namspill checker in unit tests -- unfortunately, I could never > trigger the behavior in the existing tests that gets triggered in production. > While I can reproducible trigger the invalid situation in the product, I've > unfortunately so far not managed to come up with a corresponding simplified > test. > > I'm not suspecting that there is a Go runtime bug but I want to understand > what is happening in order to take appropriate measures to correctly avoid > the situation where the Go service's initial thread/task T0 ends up with a > switched network namespace, whereas this should never happen (=famous last > words).
Thanks. Can you point to any documentation about the problem. Earlier you mentioned that there was something in the procfs man page. I didn't see it, but as the procfs man page is very large I'm sure I just missed it. >> > Is there a way to trick(?) a non-main goroutine onto T0 as an experiment? >> >> I'm not sure what a "non-main goroutine" is. All goroutines are >> basically equivalent. Assuming you mean the initial goroutine, you >> can lock that to the initial thread by calling runtime.LockOSThread in >> an init function. That should wind up calling the main function with >> a goroutine locked to the initial thread. Then you could, for >> example, start a new goroutine and then let the initial goroutine >> exit. > > > You're correct: I was thinking about the initial goroutine. > > The initial goroutine also calls main, but this can happen on any thread/task > ... so far correct now? If you call runtime.LockOSThread in an init function, as I mentioned above, then the initial goroutine will be the one that calls the main function. > Since from the perspective of Linux the leader task represents a process and > thus the process-related (and not task-related) OS-managed resources. Thus I > would assume that terminating the leader task should be avoided: what is the > process return code when terminating the leader task and leave other tasks > running? The exit code of the last task standing? > > Thus, I'm wondering how Go's scheduler can keep all threads/tasks fully > symmetrical when the Linux kernel enforces a certain asymmetry upon tasks? > Might there be a rule that is in the end causing T0 not to be terminated > despite being locked to a non-initial goroutine and this non-initial > goroutine terminates? I am still trying to understand whether there really is an asymmetry. It's true that the Go scheduler assumes that there is no such asymmetry. If that is incorrect, then we need to fix it. But first we need a test case or at least some documentation. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcUKhtqgQwhWRxyeE5UYiR29VXkoEwUg8eBpE-4tScz16g%40mail.gmail.com.