Bug#1131997: llama.cpp: Frequent crashes with Qwen3-coder

Philipp Klaus Krause Thu, 26 Mar 2026 10:49:17 -0700

Package: llama.cpp
Version: 8064+dfsg-2
Severity: normal
X-Debbugs-Cc: [email protected]


Dear Maintainer,

I am trying to use a Qwen3-coder model with llama.cpp and opencode. However, I
see frequent crashes of llama-server. One example can be found below. I suspect
that this is upstream llama.cpp bug #19304 (https://github.com/ggml-
org/llama.cpp/issues/19304), which was fixed about a week after the release of
the currently packaged version, though the bug reports there used different
Qwen-Coder variants.

[…]
slot update_slots: id  3 | task 32108 | prompt done, n_tokens = 38850,
batch.n_tokens = 17
slot init_sampler: id  3 | task 32108 | init sampler, took 5.85 ms, tokens:
text = 38850, total = 38850
slot update_slots: id  3 | task 32108 | created context checkpoint 3 of 8
(pos_min = 38832, pos_max = 38832, size = 75.376 MiB)
[New LWP 3895376]
[New LWP 3895375]
[New LWP 3895374]
[New LWP 3895373]
[New LWP 3895372]
[New LWP 3895371]
[New LWP 3895370]
[New LWP 3895369]
[New LWP 3895368]
[New LWP 3895367]
[New LWP 3895366]
[New LWP 3895365]
[New LWP 3895364]
[New LWP 3895363]
[New LWP 3895362]
[New LWP 3895128]
[New LWP 3895127]
[New LWP 3895126]
[New LWP 3895125]
[New LWP 3895124]
[New LWP 3895123]
[New LWP 3895122]
[New LWP 3895121]
[New LWP 3895120]
[New LWP 3895119]
[New LWP 3895118]
[New LWP 3895117]
[New LWP 3895116]
[New LWP 3895115]
[New LWP 3895114]
[New LWP 3895113]
[New LWP 3895112]
[New LWP 3895111]
[New LWP 3895110]
[New LWP 3895109]
[New LWP 3895108]
[New LWP 3895107]
[New LWP 3895106]
[New LWP 3895105]
[New LWP 3895104]
[New LWP 3895103]
[New LWP 3895102]
[New LWP 3895101]
[New LWP 3895100]
[New LWP 3895099]
[New LWP 3895098]
[New LWP 3895097]
[New LWP 3895096]
[New LWP 3895094]
[New LWP 3895093]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.debian.net>
Enable debuginfod for this session? (y or [n]) [answered N; input not from
terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at
../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
⚠️ warning: 56  ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: Datei oder
Verzeichnis nicht gefunden
#0  __syscall_cancel_arch () at
../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56      in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1  0x00007f9b9dc9be64 in __internal_syscall_cancel (a1=<optimized out>,
a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0,
a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:49
⚠️ warning: 49  ./nptl/cancellation.c: Datei oder Verzeichnis nicht gefunden
#2  0x00007f9b9dc9bead in __syscall_cancel (a1=<optimized out>, a2=<optimized
out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0,
nr=61) at ./nptl/cancellation.c:75
75      in ./nptl/cancellation.c
#3  0x00007f9b9dd07c07 in __GI___wait4 (pid=<optimized out>,
stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at
../sysdeps/unix/sysv/linux/wait4.c:30
⚠️ warning: 30  ../sysdeps/unix/sysv/linux/wait4.c: Datei oder Verzeichnis
nicht gefunden
#4  0x00007f9b9ec6aa53 in ggml_print_backtrace () from /usr/lib/x86_64-linux-
gnu/libggml-base.so.0
#5  0x00007f9b9ec79c3f in ?? () from /usr/lib/x86_64-linux-gnu/libggml-
base.so.0
#6  0x00007f9b9debb5fa in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f9b9dea7749 in std::terminate() () from /usr/lib/x86_64-linux-
gnu/libstdc++.so.6
#8  0x00007f9b9debb898 in __cxa_throw () from /usr/lib/x86_64-linux-
gnu/libstdc++.so.6
#9  0x00007f9b9e2796ea in ?? () from /usr/lib/x86_64-linux-
gnu/llama/libllama.so.0
#10 0x00007f9b9e2cd05c in llama_grammar_accept_impl(llama_grammar&, int) ()
from /usr/lib/x86_64-linux-gnu/llama/libllama.so.0
#11 0x000055c375409ba8 in ?? ()
#12 0x000055c375277c94 in ?? ()
#13 0x000055c3752c125e in ?? ()
#14 0x000055c3751d5209 in ?? ()
#15 0x00007f9b9dc33f75 in __libc_start_call_main
(main=main@entry=0x55c3751d1390, argc=argc@entry=19,
argv=argv@entry=0x7ffc39c3f588) at ../sysdeps/nptl/libc_start_call_main.h:58
⚠️ warning: 58  ../sysdeps/nptl/libc_start_call_main.h: Datei oder Verzeichnis
nicht gefunden
#16 0x00007f9b9dc34027 in __libc_start_main_impl (main=0x55c3751d1390, argc=19,
argv=0x7ffc39c3f588, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffc39c3f578) at ../csu/libc-start.c:360
⚠️ warning: 360 ../csu/libc-start.c: Datei oder Verzeichnis nicht gefunden
#17 0x000055c3751da7f1 in ?? ()
[Inferior 1 (process 3895092) detached]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected empty grammar stack after accepting piece: =list (40972)
./server-example: Zeile 11: 3895092 Abgebrochen                llama-server
--model models/Qwen3-Coder-Next-Q4_K_S.gguf --ctx-size 131072 --alias
"Qwen3-Coder-Next" --seed 3407 --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40
--port 8081


-- System Information:
Debian Release: forky/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.19.8+deb14-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages llama.cpp depends on:
ii  llama.cpp-tools  8064+dfsg-2

Versions of packages llama.cpp recommends:
ii  llama.cpp-tools-extra  8064+dfsg-2
ii  python3-gguf           8064+dfsg-2

Versions of packages llama.cpp suggests:
pn  llama.cpp-examples  <none>

-- no debconf information

Bug#1131997: llama.cpp: Frequent crashes with Qwen3-coder

Reply via email to