On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:
On Thursday, 19 April 2018 at 17:01:48 UTC, Matthias Klumpp wrote:
Something that maybe is relevant though: I occasionally get the following SIGABRT crash in the tool on machines which have the SIGSEGV crash:
```
Thread 53 "appstream-gener" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fdfe98d4700 (LWP 7326)]
0x00007ffff5040428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff5040428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff504202a in __GI_abort () at abort.c:89
#2 0x0000000000780ae0 in core.thread.Fiber.allocStack(ulong, ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at src/core/thread.d:4606 #3 0x00000000007807fc in _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...)
    at src/core/thread.d:4134
#4 0x00000000006f9b31 in _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx (this=0x7fde0758a680, dg=...) at /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126 #5 0x00000000006e9467 in _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency__T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=..., iname=...) at ../src/asgen/handlers/iconhandler.d:196 #6 0x00000000006ea75a in _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9ImageSizebZ9__lambda4MFZv (this=0x7fde0752bd00)
    at ../src/asgen/handlers/iconhandler.d:392
#7 0x000000000082fdfa in core.thread.Fiber.run() (this=0x7fde07528580) at src/core/thread.d:4436 #8 0x000000000082fd5d in fiber_entryPoint () at src/core/thread.d:3665
#9  0x0000000000000000 in  ()
```

You probably already figured that the new Fiber seems to be allocating its 16KB-stack, with an additional 4 KB guard page at its bottom, via a 20 KB mmap() call. The abort seems to be triggered by mprotect() returning -1, i.e., a failure to disallow all access to the the guard page; so checking `errno` should help.

Jup, I did that already, it just took a really long time to run because when I made the change to print errno I also enabled detailed GC profiling (via the PRINTF* debug options). Enabling the INVARIANT option for the GC is completely broken by the way, I enforced the compile to work by casting to shared, with the result of the GC locking up forever at the start of the program.

Anyway, I think for a chance I actually produced some useful information via the GC debug options:
Given the following crash:
```
#0 0x00000000007f1d94 in _D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., ptop=0x7fdfce7fc010, pbot=0x7fdfcdbfc010)
    at src/gc/impl/conservative/gc.d:1990
        p1 = 0x7fdfcdbfc010
        p2 = 0x7fdfce7fc010
        stackPos = 0
[...]
```
The scanned range seemed fairly odd to me, so I searched for it in the (very verbose!) GC debug output, which yielded:
```
235.244445: 0xc4f090.Gcx::addRange(0x8264230, 0x8264270)
235.244460: 0xc4f090.Gcx::addRange(0x7fdfcdbfc010, 0x7fdfce7fc010)
235.253861: 0xc4f090.Gcx::addRange(0x8264300, 0x8264340)
235.253873: 0xc4f090.Gcx::addRange(0x8264390, 0x82643d0)
```
So, something is calling addRange explicitly there, causing the GC to scan a range that it shouldn't scan. Since my code doesn't add ranges to the GC, and I looked at the generated code from girtod/GtkD and it very much looks fine to me, I am currently looking into EMSI containers[1] as the possible culprit. That library being the issue would also make perfect sense, because this issue started to appear with such a frequency only after containers were added (there was a GC-related crash before, but that might have been a different one).

So, I will look into that addRange call next.

[1]: https://github.com/dlang-community/containers

Reply via email to