Package: src:ggml
Version: 0.9.7-2
While RPC is still labeled experimental, at least for CPU it does work, and in some cases can be a
substantial win (eg in my case I don't have 60G available memory in one server to run gpt-oss 120B
but across two I do, giving a 10x speedup vs running with the model 10% in the filesystem).
I did local builds with -DGGML_RPC=ON in both ggml and llama.cpp's rules file, renamed rpc-server to
llama-rpc-server in llama.cpp-tools-extra.install and added
usr/lib/${DEB_HOST_MULTIARCH}/ggml/backends0/libggml-rpc.so to libggml0.install and it seems to work
great.