97: doc: add recipes for launching llama.cpp server.

ELPA Syncer Sat, 05 Apr 2025 10:30:55 -0700

branch: externals/minuet
commit 9cbd2fcaddf1526cdb63db464dbaf585aef5ebc7
Author: Milan Glacier <d...@milanglacier.com>
Commit: Milan Glacier <d...@milanglacier.com>


    doc: add recipes for launching llama.cpp server.
---
 README.md  | 15 ++++++++++---
 recipes.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index bfff623c6b..46b7514686 100644
--- a/README.md
+++ b/README.md
@@ -196,6 +196,9 @@ llama-server \
     (minuet-set-optional-options minuet-openai-fim-compatible-options 
:max_tokens 56))
 ```
 
+For additional example bash scripts to run llama.cpp based on your local
+computing power, please refer to [recipes.md](./recipes.md).
+
 </details>
 
 # API Keys
@@ -540,7 +543,8 @@ text completion API, not chat completion, so system prompts 
and few-shot
 examples are not applicable.
 
 For example, you can set the `end_point` to
-`http://localhost:11434/v1/completions` to use `ollama`.
+`http://localhost:11434/v1/completions` to use `ollama`, or set it to
+`http://localhost:8012/v1/completions` to use `llama.cpp`.
 
 <details>
 
@@ -573,6 +577,11 @@ request timeout from outputing too many tokens.
 (minuet-set-optional-options minuet-openai-fim-compatible-options :top_p 0.9)
 ```
 
+For example bash scripts to run llama.cpp based on your local
+computing power, please refer to [recipes.md](./recipes.md). Note
+that the model for `llama.cpp` must be determined when you launch the
+`llama.cpp` server and cannot be changed thereafter.
+
 </details>
 
 # Troubleshooting
@@ -594,7 +603,7 @@ To diagnose issues, examine the buffer content from 
`*minuet*`.
 
 # Acknowledgement
 
-- [continue.dev](https://www.continue.dev): not a emacs plugin, but I find a
-  lot LLM models from here.
+- [continue.dev](https://www.continue.dev): not a emacs plugin, but I find a 
lot
+  LLM models from here.
 - [llama.vim](https://github.com/ggml-org/llama.vim): Reference for CLI
   parameters used to launch the llama-cpp server.
diff --git a/recipes.md b/recipes.md
new file mode 100644
index 0000000000..8e11099309
--- /dev/null
+++ b/recipes.md
@@ -0,0 +1,73 @@
+# Launching the llama.cpp Server: Example Script
+
+This guide provides several configuration variants for the `qwen2.5-coder`
+based on local computing power, specifically the available VRAM.
+
+### **For Systems with More Than 16GB VRAM**
+
+```bash
+llama-server \
+    -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \
+    --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
+    --ctx-size 0 --cache-reuse 256
+```
+
+### **For Systems with Less Than 16GB VRAM**
+
+```bash
+llama-server \
+    -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \
+    --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
+    --ctx-size 0 --cache-reuse 256
+```
+
+### **For Systems with Less Than 8GB VRAM**
+
+```bash
+llama-server \
+    -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \
+    --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
+    --ctx-size 0 --cache-reuse 256
+```
+
+## Example minuet config
+
+```elisp
+(use-package minuet
+    :config
+    (setq minuet-provider 'openai-fim-compatible)
+    (setq minuet-n-completions 1) ; recommended for Local LLM for resource 
saving
+    ;; I recommend beginning with a small context window size and incrementally
+    ;; expanding it, depending on your local computing power. A context window
+    ;; of 512, serves as an good starting point to estimate your computing
+    ;; power. Once you have a reliable estimate of your local computing power,
+    ;; you should adjust the context window to a larger value.
+    (setq minuet-context-window 512)
+    (plist-put minuet-openai-fim-compatible-options :end-point 
"http://localhost:8012/v1/completions";)
+    ;; an arbitrary non-null environment variable as placeholder
+    (plist-put minuet-openai-fim-compatible-options :name "Llama.cpp")
+    (plist-put minuet-openai-fim-compatible-options :api-key "TERM")
+    ;; The model is set by the llama-cpp server and cannot be altered
+    ;; post-launch.
+    (plist-put minuet-openai-fim-compatible-options :model "PLACEHOLDER")
+
+    ;; Llama.cpp does not support the `suffix` option in FIM completion.
+    ;; Therefore, we must disable it and manually populate the special
+    ;; tokens required for FIM completion.
+    (minuet-set-optional-options minuet-openai-fim-compatible-options :suffix 
nil :template)
+    (minuet-set-optional-options
+     minuet-openai-fim-compatible-options
+     :prompt
+     (defun minuet-llama-cpp-fim-qwen-prompt-function (ctx)
+         (format "<fim_prefix|>%s\n%s<|fim_suffix|>%s<|fim_middle|>"
+                 (plist-get ctx :language-and-tab)
+                 (plist-get ctx :before-cursor)
+                 (plist-get ctx :after-cursor)))
+     :template)
+
+    (minuet-set-optional-options minuet-openai-fim-compatible-options 
:max_tokens 56))
+```
+
+## **Acknowledgment**
+
+- [llama.vim](https://github.com/ggml-org/llama.vim): A reference for CLI 
parameters used in launching the `llama.cpp` server.

[elpa] externals/minuet 9cbd2fcadd 82/97: doc: add recipes for launching llama.cpp server.

Reply via email to