branch: externals/minuet commit 9cbd2fcaddf1526cdb63db464dbaf585aef5ebc7 Author: Milan Glacier <d...@milanglacier.com> Commit: Milan Glacier <d...@milanglacier.com>
doc: add recipes for launching llama.cpp server. --- README.md | 15 ++++++++++--- recipes.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index bfff623c6b..46b7514686 100644 --- a/README.md +++ b/README.md @@ -196,6 +196,9 @@ llama-server \ (minuet-set-optional-options minuet-openai-fim-compatible-options :max_tokens 56)) ``` +For additional example bash scripts to run llama.cpp based on your local +computing power, please refer to [recipes.md](./recipes.md). + </details> # API Keys @@ -540,7 +543,8 @@ text completion API, not chat completion, so system prompts and few-shot examples are not applicable. For example, you can set the `end_point` to -`http://localhost:11434/v1/completions` to use `ollama`. +`http://localhost:11434/v1/completions` to use `ollama`, or set it to +`http://localhost:8012/v1/completions` to use `llama.cpp`. <details> @@ -573,6 +577,11 @@ request timeout from outputing too many tokens. (minuet-set-optional-options minuet-openai-fim-compatible-options :top_p 0.9) ``` +For example bash scripts to run llama.cpp based on your local +computing power, please refer to [recipes.md](./recipes.md). Note +that the model for `llama.cpp` must be determined when you launch the +`llama.cpp` server and cannot be changed thereafter. + </details> # Troubleshooting @@ -594,7 +603,7 @@ To diagnose issues, examine the buffer content from `*minuet*`. # Acknowledgement -- [continue.dev](https://www.continue.dev): not a emacs plugin, but I find a - lot LLM models from here. +- [continue.dev](https://www.continue.dev): not a emacs plugin, but I find a lot + LLM models from here. - [llama.vim](https://github.com/ggml-org/llama.vim): Reference for CLI parameters used to launch the llama-cpp server. diff --git a/recipes.md b/recipes.md new file mode 100644 index 0000000000..8e11099309 --- /dev/null +++ b/recipes.md @@ -0,0 +1,73 @@ +# Launching the llama.cpp Server: Example Script + +This guide provides several configuration variants for the `qwen2.5-coder` +based on local computing power, specifically the available VRAM. + +### **For Systems with More Than 16GB VRAM** + +```bash +llama-server \ + -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \ + --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \ + --ctx-size 0 --cache-reuse 256 +``` + +### **For Systems with Less Than 16GB VRAM** + +```bash +llama-server \ + -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \ + --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \ + --ctx-size 0 --cache-reuse 256 +``` + +### **For Systems with Less Than 8GB VRAM** + +```bash +llama-server \ + -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \ + --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \ + --ctx-size 0 --cache-reuse 256 +``` + +## Example minuet config + +```elisp +(use-package minuet + :config + (setq minuet-provider 'openai-fim-compatible) + (setq minuet-n-completions 1) ; recommended for Local LLM for resource saving + ;; I recommend beginning with a small context window size and incrementally + ;; expanding it, depending on your local computing power. A context window + ;; of 512, serves as an good starting point to estimate your computing + ;; power. Once you have a reliable estimate of your local computing power, + ;; you should adjust the context window to a larger value. + (setq minuet-context-window 512) + (plist-put minuet-openai-fim-compatible-options :end-point "http://localhost:8012/v1/completions") + ;; an arbitrary non-null environment variable as placeholder + (plist-put minuet-openai-fim-compatible-options :name "Llama.cpp") + (plist-put minuet-openai-fim-compatible-options :api-key "TERM") + ;; The model is set by the llama-cpp server and cannot be altered + ;; post-launch. + (plist-put minuet-openai-fim-compatible-options :model "PLACEHOLDER") + + ;; Llama.cpp does not support the `suffix` option in FIM completion. + ;; Therefore, we must disable it and manually populate the special + ;; tokens required for FIM completion. + (minuet-set-optional-options minuet-openai-fim-compatible-options :suffix nil :template) + (minuet-set-optional-options + minuet-openai-fim-compatible-options + :prompt + (defun minuet-llama-cpp-fim-qwen-prompt-function (ctx) + (format "<fim_prefix|>%s\n%s<|fim_suffix|>%s<|fim_middle|>" + (plist-get ctx :language-and-tab) + (plist-get ctx :before-cursor) + (plist-get ctx :after-cursor))) + :template) + + (minuet-set-optional-options minuet-openai-fim-compatible-options :max_tokens 56)) +``` + +## **Acknowledgment** + +- [llama.vim](https://github.com/ggml-org/llama.vim): A reference for CLI parameters used in launching the `llama.cpp` server.