All the time since
https://list.orgmode.org/orgmode/ch3pr84mb3424dc5a13943a4a500fee3cc5...@ch3pr84mb3424.namprd84.prod.outlook.com/
where the clock sum code has been rewritten to use org-element API, I
have been hitting numerous errors, ranging from AVL-tree not maintaining
its ordering, to plain infinite loops.
Even though it seems to be triggered by the patch, it is not the patch
implementation that is introducing the bugs. The patch merely exposed
problems in org-element.

The first is the known problem with org-element-cache-map going into
infinite loop. The reproducer is known -
https://orgmode.org/list/87seeoznr6.fsf@localhost, but it is not the
first problem with that function. The underlying cause is complexity of
managing constantly rebalancing AVL-tree and various edge cases in the
org-element-cache-map code itself. It does not help that
org-element-cache-map should be able to handle AVL tree being modified
under the feet when the buffer changes, or we need to parse new items.

The second is AVL-tree data structure itself. When used for clock sums,
it tends to accumulate large number of CLOCK elements into cache - one
after others. Each insertion into AVL tree is a log N
operation. Since avl-tree.el does not have any locality optimizations,
we end up with significant slowdowns.

The third, is the ordering dance we have to do when updating the cache
partially (`org-element-cache-get-key'). Normally, the cache is ordered
by :begin values, but :begin values are not always correct while the
cache is in the middle of updating as the user inserts/deletes text. So,
Org has to roll out intermediate key that are only relevant during cache
updates. That makes AVL tree rebalancing tricky and leads to subtle bugs.

To properly fix the above, I am thinking to abandon AVL tree entirely,
and switch to skip lists that do not have to re-balance. Lists are also
much easier to traverse, even in the presence of changes.
https://git.sr.ht/~yantar92/org-mode/tree/feature/cache-skip-list/item/lisp/org-skip-list.el

But such a switch is not at all trivial. So, for the time being, I will
revert the changes to org-clock.el. More drastic changes will have to
wait until after the next Org release. Otherwise, the next release will
have to be postponed indefinitely.

I now temporarily reverted the org-clock-sum implementation using
org-element API.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=cde2b1e99
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=af587bcc5
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=e85ecd813

-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to