Re: Querying cached parse trees without opening files

chris Tue, 27 May 2025 15:00:54 -0700

On Tuesday, 27 May 2025 18:44:19 UTC Martin Edström wrote:
> Well-explained! Thank you, Kristoffer :)
> 
> On Mon, 26 May 2025 16:02:30 -0500, Kristoffer Balintona 
<krisbalint...@gmail.com> wrote:
> > On Mon, May 26, 2025 at 12:02 PM chris <inkbottle...@gmail.com> wrote:
> > > Org-node seems very interesting! I noticed that your
> > > [parser.el](https://
> > > github.com/meedstrom/org-mem/blob/main/org-mem-parser.el) is only about
> > > 600
> > > lines long, whereas Org-mode’s parser seems larger and possibly more
> > > scattered? Are they roughly equivalent in scope/intent, or is your
> > > version
> > > focused on a different subset of Org features?
> > 
> > Hi,
> > 
> > I am not Martin, but I’ll share a bit about what I’ve gathered about the
> > project after having used org-node for a few months.
> > 
> > As far as I can tell, the org-mem parser is a parser specially tailored
> > for a specific end, namely, speed. What sets org-node apart from
> > org-roam is that it does not need anything on-disk; it maintains hash
> > tables inside Emacs for all its data. (Additionally, and in line with
> > org-node’s mission for performance, it does not end up needing to load
> > org at all, since its parser is an implementation independent of it.) It
> > can get away with this because the parser is very fast and leverage’s
> > el-job’s[1] asynchronous processing of lists.
> > 
> > Of course, the trade off for parsing speed is completeness: org-mem must
> > implement its own regexps to find the data it needs. Everything else is
> > ignored. So if org-mem wants to collect e.g. timestamp data, it must do
> > so without any help from org (as was recently implemented). Org also
> > does a lot to process things like org keywords in files. And, of course,
> > this approach is susceptible to mismatches with what org’s parser
> > actually recognizes since org-mem’s parser is bespoke.
> > 
> > I’m guessing part of Martin’s motivation to ask his original question is
> > related to how tenable maintaining a parser independent from org is. It
> > would be much easier to rely on the definitive org parser if possible. And
> > if I would speculate further, I think what he has in mind is to store
> > the parse trees on disk and read from those (potentially caching those
> > on-disk parse trees if necessary) rather than the user’s files. This way,
> > performance is still fast since the user’s org files are already parsed
> > (which is the expensive part).
> > 
> > Martin can chime in and share to correct me if I’m wrong.


Is the idea to memoize the output of `org-element-parse-buffer` in a file using 
a change date or control sum to verify the content hasn't changed, so as to be 
able to reuse that later, eliminating the need to parse the org-file again, 
like in the minimal naive example below?

#+begin_src emacs-lisp :lexical t :wrap example :results raw
;; I use org-mode "83a55c6fe", the example might not work
;; with earlier version.
(let*
    ((org-content
      (mapconcat (function identity)
                 '(
                   "# I'm a comment" nil
                   ":PROPERTIES:"
                   ":ID:       463e4d2b-65d7-40ea-ad2d-80abd9edbeff"
                   ":special_property: cool special property"
                   ":END:"
                   "#+title: cool title" nil
                   "* hello"
                   "hello" nil
                   )
                 "\n"))

     (ast (with-temp-buffer
                (insert org-content)
                (org-mode)
                (org-element-parse-buffer)))

     (print-circle t)

     (as-a-string (prin1-to-string ast))

     (cleaned-string
      (replace-regexp-in-string
       "#<killed buffer>" "\"quux\"" as-a-string))

     ;; Here, you can save the string to a file.
     ;; Then, you can reuse the string to convert it back into an AST
     ;; without having to parse the org-file again.

     (ast-out (car (read-from-string cleaned-string)))

     (new-content (org-element-interpret-data ast-out)))

(princ new-content))
#+end_src

#+RESULTS:
#+begin_example
# I'm a comment

:PROPERTIES:
:ID:       463e4d2b-65d7-40ea-ad2d-80abd9edbeff
:special_property: cool special property
:END:
,#+title: cool title

,* hello
hello
#+end_example

Chris

Re: Querying cached parse trees without opening files

Reply via email to