Hi,

On 2026-03-10 15:28, Matt Corallo wrote:
> While RPC is still labeled experimental, at least for CPU it does work,
> and in some cases can be a substantial win (eg in my case I don't have
> 60G available memory in one server to run gpt-oss 120B but across two I
> do, giving a 10x speedup vs running with the model 10% in the filesystem).
> 
> I did local builds with -DGGML_RPC=ON in both ggml and llama.cpp's rules
> file, renamed rpc-server to llama-rpc-server in llama.cpp-tools-
> extra.install and added usr/lib/${DEB_HOST_MULTIARCH}/ggml/backends0/
> libggml-rpc.so to libggml0.install and it seems to work great.

This also worked on my end.

I've enabled the ggml backend in the most recent upload to experimental.

Regarding rpc-server, I agree that this should be called
'llama-rpc-server', and requested a rename upstream [1]. 'rpc-server'
alone is too generic for /usr/bin.

@Mathieu: this could be a good candidate for another systemd service, I
think?

Best,
Christian

[1]: https://github.com/ggml-org/llama.cpp/pull/25045

Reply via email to