hi everybody,

I would like to discuss a performance issue I have been experiencing with
org-lint, specifically with the org-lint-invalid-id-link checker, after
upgrading to Emacs 31 and org 9.8.

I use org-roam and I have more than 1000 org files.

Every time a file is saved and flycheck triggers org-lint, the function
org-lint-invalid-id-link calls org-id-update-id-locations, which scans all
org files on disk to rebuild the ID location cache. In my case this means
scanning 1382 files on every single save, causing noticeable sluggishness.

Looking at the code in org-lint.el (function org-lint-invalid-id-link):

  (let ((id-locations-updated nil))
    (org-element-map ast 'link
      (lambda (link)
        (unless id-locations-updated
          (org-id-update-id-locations nil t)
          (setq id-locations-updated t))
        ...
          )))

The id-locations-updated flag correctly prevents scanning more than once
per lint run. However, since it is a local variable, it is reset to nil
on every call -- meaning a full scan of all org files happens on every
flycheck-triggered lint run, i.e., on every save of any org file
containing at least one id: link.

As far as I understand, org-id-locations is already an in-memory hash table
cache mapping IDs to files, loaded from org-id-locations-file at startup. The
function org-id-find-id-file, called immediately after in the same checker,
already trusts this cache. So org-lint-invalid-id-link is ignoring the cache
for the update step but relying on it for the lookup step.

I think there are two complementary improvements, and I would like to
discuss a potential fix, in 2 parts:

1. In org-lint-invalid-id-link: only call org-id-update-id-locations if
   org-id-locations is empty or not yet loaded, rather than unconditionally
   on every lint run:

     (unless id-locations-updated
       (when (or (not org-id-locations)
                 (zerop (hash-table-count org-id-locations)))
         (org-id-update-id-locations nil t))
       (setq id-locations-updated t))

   This way the scan only happens once per session (when the cache is
   genuinely empty), not on every save.

2. Keep org-id-locations incrementally up to date on save: when a file
   is saved, scan just that file for IDs and call org-id-add-location
   for each one. It would mean the cache is always fresh without ever
   needing a full rescan during linting.

The combination of both would mean: the cache is kept current incrementally
on save, and org-lint trusts it rather than rebuilding it from scratch.

I may be missing something -- perhaps there is a reason the full rescan is
necessary before declaring a link invalid -- so we wanted to raise this here
before proposing a patch.

Thank you.

--daniel

--
Daniel M. German                  "The vitality of language lies in
                                   its ability to limn the actual,
                                   imagined and possible lives
    Toni Morrison ->               of its speakers, readers, writers."
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .

Reply via email to