Re: Automating Note Linking in Zetteldeft Using AI

Jean Louis Thu, 06 Feb 2025 00:31:45 -0800

* Gideon Silberman Moro <[email protected]> [2025-02-05 09:39]:
> Hi everyone,
> 
> I'm looking for a way to automatically link notes in Zetteldeft using AI.
> Ideally, I'd like an approach that analyzes the content of my notes and
> suggests or creates links between relevant ones.
> 
> Has anyone experimented with integrating AI (e.g., LLMs, embeddings, or
> external tools like OpenAI or local models) to automate or enhance
> Zetteldeft's linking process? Are there existing Emacs packages or
> workflows that could help with this (without the need of an API)?


Hi! You can automate linking in Zetteldeft using AI by leveraging
local models or embeddings. Here's a quick approach:

1. **Embeddings**: Use a local model (e.g., Sentence Transformers) to
   generate embeddings for your notes. Compare embeddings to find
   semantic similarities and suggest links. Tools like `transformers`
   or `gensim` can help.

   Personally I work with Dynamic Knowledge Repository which in turn
   encompass Org documents and all other kinds of documents. So my
   information is ordered in the PostgreSQL database. Using Langchain
   and other tools for chunking is necessary in that sense, as by
   using chunks it is possible to augment it better and better find
   relevant documents.

   Though RAG could be used as well, it depends of course of how much
   data you have. My "Meta" Org is like 70,000 documents, and they are
   all hyperlinks, but what about hyperlinks within hyperlinks? So
   that purpose of hyperlinking automatically is possible by using
   either RAG or embeddings. Providing RAG or embeddings is rather
   easy when there is database involved, considering that vector type
   already exists in PostgreSQL.

2. **LLMs**: Run a local LLM (e.g., IBM Granite, or Microsoft Phi as
   fully free software) to analyze note content and suggest links. You
   can script this in Emacs Lisp over the Python.

3. **Emacs Packages**: I would not recommend any in this moment. Your
   request is very specific. I am making my own LLM functions, here is
   one of them that works and can be adjusted:

(defun rcd-llm-llamafile (prompt &optional memory rcd-llm-model)
  "Send PROMPT to Llama file.

Optional MEMORY and MODEL may be used."
       (let* ((rcd-llm-model (cond ((boundp 'rcd-llm-model) rcd-llm-model)
                                   (t "LLaMA_CPP")))
              (memory (cond ((and memory rcd-llm-use-users-llm-memory)
                             (concat "Following is user's memory, until the 
END-OF-MEMORY-TAG: \n\n" memory "\n\n END-OF-MEMORY-TAG\n\n"))))
              (prompt (cond (memory (concat memory "\n\n" prompt))
                            (t prompt)))
              (temperature 0.8)
              (max-tokens -1)
              (top-p 0.95)
              (stream :json-false)
              (buffer (let ((url-request-method "POST")
                            (url-request-extra-headers
                             '(("Content-Type" . "application/json")
                               ("Authorization" . "Bearer no-key")))
                            (prompt (encode-coding-string prompt 'utf-8))
                            (url-request-data
                             (encode-coding-string
                              (setq rcd-llm-last-json
                                    (json-encode
                                     `((model . ,rcd-llm-model)
                                       (messages . [ ((role . "system")
                                                      (content . "You are a 
helpful assistant. Answer short."))
                                                    ((role . "user")
                                                     (content . ,prompt))])
                                       (temperature . ,temperature)
                                       (max_tokens . ,max-tokens)
                                       (top_p . ,top-p)
                                       (stream . ,stream))))
                              'utf-8)))
                        (url-retrieve-synchronously
                         ;; "http://127.0.0.1:8080/v1/chat/completions";))))
                         "http://192.168.188.140:8080/v1/chat/completions";))))
       (rcd-llm-response buffer)))

and you can read there it uses some memory if necessary, and that
memory can be also the list of links which you would like to insert.

So the solution could be in the simple function which context or
system message contains the summaries and list of links, a simple
prompt could instruct the LLM to hyperlink it all.

Additionally you could use grammar instruction from llama.cpp

No API needed if you stick to local models!

Your idea is great.

Let me say it this way, solution to your problem is so much closer
then we think. It is just there, requires some tuning and it can
already work.

It requires planning of the knowledge. I don't want all links by
hyperlinked just because they match, 70,000 documents is there, but I
don't want them hyperlinked. I want specific hyperlinks hyperlinked.

Many of them are also ranked, I worked with many. So I would like it
by rank too. You have to plan first how to sort the information, which
information, etc.

Then you provide it to embeddings, but how? Where are you going to
store vectors? Or RAG?

Using PostgreSQL and vector type is good way to go.

-- 
Jean Louis

Re: Automating Note Linking in Zetteldeft Using AI

Reply via email to