Hi,
On 2026-03-10 15:28, Matt Corallo wrote:
> While RPC is still labeled experimental, at least for CPU it does work,
> and in some cases can be a substantial win (eg in my case I don't have
> 60G available memory in one server to run gpt-oss 120B but across two I
> do, giving a 10x speedup vs running with the model 10% in the filesystem).
>
> I did local builds with -DGGML_RPC=ON in both ggml and llama.cpp's rules
> file, renamed rpc-server to llama-rpc-server in llama.cpp-tools-
> extra.install and added usr/lib/${DEB_HOST_MULTIARCH}/ggml/backends0/
> libggml-rpc.so to libggml0.install and it seems to work great.
This also worked on my end.
I've enabled the ggml backend in the most recent upload to experimental.
Regarding rpc-server, I agree that this should be called
'llama-rpc-server', and requested a rename upstream [1]. 'rpc-server'
alone is too generic for /usr/bin.
@Mathieu: this could be a good candidate for another systemd service, I
think?
Best,
Christian
[1]: https://github.com/ggml-org/llama.cpp/pull/25045