Package: src:ggml
Version: 0.9.7-2

While RPC is still labeled experimental, at least for CPU it does work, and in some cases can be a substantial win (eg in my case I don't have 60G available memory in one server to run gpt-oss 120B but across two I do, giving a 10x speedup vs running with the model 10% in the filesystem).

I did local builds with -DGGML_RPC=ON in both ggml and llama.cpp's rules file, renamed rpc-server to llama-rpc-server in llama.cpp-tools-extra.install and added usr/lib/${DEB_HOST_MULTIARCH}/ggml/backends0/libggml-rpc.so to libggml0.install and it seems to work great.

Reply via email to