dmg <[email protected]> writes: > Every time a file is saved and flycheck triggers org-lint, the function > org-lint-invalid-id-link calls org-id-update-id-locations, which scans all > org files on disk to rebuild the ID location cache. In my case this means > scanning 1382 files on every single save, causing noticeable sluggishness. > > Looking at the code in org-lint.el (function org-lint-invalid-id-link): > > (let ((id-locations-updated nil)) > (org-element-map ast 'link > (lambda (link) > (unless id-locations-updated > (org-id-update-id-locations nil t) > (setq id-locations-updated t)) > ... > ))) > > The id-locations-updated flag correctly prevents scanning more than once > per lint run. However, since it is a local variable, it is reset to nil > on every call -- meaning a full scan of all org files happens on every > flycheck-triggered lint run, i.e., on every save of any org file > containing at least one id: link.
That's on purpose. The idea is that it is unlikely that the ids get updated *during* the linting, but they can definitely change (including outside Emacs) between org-lint runs. > As far as I understand, org-id-locations is already an in-memory hash table > cache mapping IDs to files, loaded from org-id-locations-file at startup. The > function org-id-find-id-file, called immediately after in the same checker, > already trusts this cache. So org-lint-invalid-id-link is ignoring the cache > for the update step but relying on it for the lookup step. org-id-find-id-file indeed trusts the cache. However, its main caller - `org-id-find', falls back to `org-id-update-id-locations' when org-id-find-id-file fails. > I think there are two complementary improvements, and I would like to > discuss a potential fix, in 2 parts: > > 1. In org-lint-invalid-id-link: only call org-id-update-id-locations if > org-id-locations is empty or not yet loaded, rather than unconditionally > on every lint run: > > (unless id-locations-updated > (when (or (not org-id-locations) > (zerop (hash-table-count org-id-locations))) > (org-id-update-id-locations nil t)) > (setq id-locations-updated t)) > > This way the scan only happens once per session (when the cache is > genuinely empty), not on every save. That could cause extra false-positives if files are changed on disk. > 2. Keep org-id-locations incrementally up to date on save: when a file > is saved, scan just that file for IDs and call org-id-add-location > for each one. It would mean the cache is always fresh without ever > needing a full rescan during linting. That would not work very well on large files. Saving is something done much more frequently compared to looking up ids. > I may be missing something -- perhaps there is a reason the full rescan is > necessary before declaring a link invalid -- so we wanted to raise this here > before proposing a patch. We have a relevant optimization that tries not to re-parse everything every time. Check out `org-id--locations-checksum'. However, in your specific case, that checksum is changes upon save. I logical improvement would be not treating change in *one* file as a trigger for full re-parsing, but do it more granularly in `org-id-update-id-locations', reusing cache from the files that did not change. -- Ihor Radchenko // yantar92, Org mode maintainer, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
