John Wiegley <[email protected]> writes:

> All of this should be done now:

Ok. Let's go a full LLM craze now :)

Ihor writes:
  In the context, I provide John Wiegley's re-implementation of Org mode table 
formulas.
  I am also providing the Org manual entries describing this functionality.
  You can lookup org-table.el (large buffer) for the original implementation.
  The Jogn's rewrite is large and done by LLM, so I expect various subtle 
details to
  be inconsistent. Please look at the John's code and check if the code behavior
  contradicts things documented in the manual.

# LLM response
I have analyzed John Wiegley’s =org-table-loeb.el= against the Org manual (from 
the context buffer) and the original =org-table.el= implementation. The rewrite 
is large and, as you suspected, contains several subtle inconsistencies with 
documented behavior. Below is a systematic list of contradictions, grouped by 
functional area.

---

** 1. Reference parsing for dependency analysis

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| *Column‑only (=$C=) and row‑only (=@R=) references* are valid and expand to 
the current row/column (manual: “if you omit either part, the current 
row/column is implied”).<br>Original =org-table-eval-formula= substitutes =$C= 
with a field reference based on the current row. | =org-table-loeb--parse-refs= 
only matches absolute =@R$C= patterns. It ignores =$C= and =@R=. | Dependency 
graph misses edges for column‑ and row‑only references. Cycles like =$1= ↔ =$2= 
in the same row will not be detected, and the topological order used for cycle 
detection may be wrong. |
| *Range references (=@R1$C1..@R2$C2=, =$N..$M=, =@I..@II=)* are not parsed for 
dependencies in the original; dependencies are resolved when the range is 
expanded during evaluation. | John’s code also does not parse range references 
in =org-table-loeb--parse-refs=. | Missed dependencies within ranges. However, 
because the closure eventually calls =org-table-loeb--get= for each cell in the 
range, evaluation will still work, but cycle detection may fail for cycles that 
go through a range. |

*Code locations:*
- =org-table-loeb--parse-refs= (line ~120) regex =@\\([0-9]+\\)\\$\\([0-9]+\\)=.
- Dependency graph built by =org-table-loeb--build-deps= uses only those parsed 
refs.

*Suggested fix:* Expand the regex to also capture =$C= and =@R= (or, simpler, 
run dependency analysis on the fully expanded formula after preprocessing, 
where all references are already absolute =@R$C=).

---

** 2. Hline reference semantics

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| *Standalone =@I=* – the manual states “@I refers to the first hline”. 
However, row numbering counts only data lines; hlines are ignored. The 
practical meaning (seen in examples like =@-I$2=) is “the first data row after 
that hline”. Original =org-table--descriptor-line= resolves =I= to the line 
number of the hline itself, but =org-table-get-range= then skips hlines when 
fetching data rows. | =org-table-loeb--expand-hline-refs= treats all standalone 
=@I= as “first row after that hline” (=target-row (+ target-hline 1 offset)=). 
| Consistent with practical usage, but differs from the literal manual 
description. This may cause subtle differences if a formula expects =@I= to 
refer to the hline position (which is not a data row). |
| *Ranges =@I..@II=* – manual: start means “first row after that hline”, end 
means “last row before that hline”. | John’s code implements exactly that: 
start = =hline+1=, end = =hline= (no +1). | Correct. |

*Code location:* =org-table-loeb--expand-hline-refs= (line ~230).

---

** 3. Missing =$0= (current column) reference

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| =$0= refers to the current column (manual: “=@0= and =$0= refer to the 
current row and column, respectively”). Original expands =$0= to the current 
column number. | =org-table-loeb--expand-relative-refs= only expands =@0=. =$0= 
is left unchanged, causing a syntax error in the resulting Lisp/Calc 
expression. | Formulas using =$0= will break. |

*Code location:* =org-table-loeb--expand-relative-refs= (line ~340) regex =@0= 
only.

---

** 4. Lisp formula flags =N= and =L= not implemented

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| *=N= flag* – references are parsed as numbers; non‑numbers become 0. Ranges 
become space‑separated numbers suitable for =apply= etc.<br>*=L= flag* – 
references are inserted literally (without quotes). | 
=org-table-loeb--make-lisp-evaluator= ignores =:force-numbers= and =:literal= 
flags from =org-table-loeb--parse-format-flags=. All references are substituted 
as =(org-table-loeb--get …)= calls (string interpolation). | Lisp formulas that 
rely on numeric operations (=' (+ $1 $2) ;N=) will fail because =$1= is 
replaced by a string‑yielding call. Literal‑mode formulas (=' (substring "$1" 
…) ;L=) will also break. |

*Code locations:*
- =org-table-loeb--parse-format-flags= parses =N= and =L= correctly.
- =org-table-loeb--make-lisp-evaluator= does not branch on those flags.

---

** 5. Duration flags =t=, =T=, =U= parsed but not honored

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| Flags =t=, =T=, =U= control duration formatting and conversion (manual: 
“Duration computations in Calc or Lisp”). Original converts time strings 
=HH:MM:SS= to seconds before calculation and formats results back. | 
=org-table-loeb--parse-format-flags= adds =:time-format=, =:time-hhmmss=, 
=:time-hhmm= keys, but no conversion is applied in 
=org-table-loeb--format-result= or in the evaluator. | Duration calculations 
will produce wrong results (e.g., =2:12 + 1:47= will be treated as string 
concatenation or numeric addition of =2= and =1=). |

*Code location:* =org-table-loeb--parse-format-flags= (line ~280) adds flags, 
but they are unused.

---

** 6. Empty‑cell handling diverges for Lisp formulas

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| With =E= flag, empty fields become =nan= in Calc formulas and empty string 
=""= in Lisp formulas. With =N= flag, empty fields become =0= for both. | 
=org-table-loeb--handle-empty-value= returns ="nan"= for empty when 
=:keep-empty= is set, otherwise ="0"=. No distinction between Calc and Lisp. | 
Lisp formulas that expect empty strings will receive ="nan"= or ="0"=, 
potentially breaking string operations. |

*Code location:* =org-table-loeb--handle-empty-value= (line ~300).

---

** 7. Range expansion missing for Lisp formulas

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| In Lisp formulas, ranges (e.g., =$1..$3=) are expanded to a list of values 
(or numbers with =N= flag). Original substitutes the range with a vector string 
for Calc, and a list for Lisp. | =org-table-loeb--substitute-ranges= is called 
only from =org-table-loeb--make-calc-evaluator=, not from 
=org-table-loeb--make-lisp-evaluator=. | Lisp formulas containing ranges will 
see the raw =$1..$3= syntax, causing evaluation errors. |

*Code location:* =org-table-loeb--make-lisp-evaluator= does not call 
=--substitute-ranges=.

---

** 8. Preprocessing order differs from original

| Original order (in =org-table-eval-formula=) | John’s order 
(=org-table-loeb--preprocess-formula=) |
|-----------------------------------------------|-----------------------------------------------------|
| 1. Substitute named fields (=org-table-formula-substitute-names=).<br>2. 
Handle first/last refs (=org-table-formula-handle-first/last-rc=).<br>3. 
Remote‑reference indirection.<br>4. Expand remote references.<br>5. Expand 
complex ranges (=org-table-range-regexp=).<br>6. Expand simple ranges 
(=$N..$M=). | 1. Excel‑style refs.<br>2. Expand remote refs.<br>3. Substitute 
named fields.<br>4. Expand hline refs.<br>5. Expand first/last refs.<br>6. 
Expand relative refs. |

*Impact:* The different order could cause subtle differences when a formula 
mixes multiple reference types (e.g., a named column that also contains a 
remote reference). However, because each expansion step eventually produces 
absolute =@R$C= references, the final result may still be correct. The main 
risk is that early expansion of remote refs replaces them with literal values, 
preventing later steps (like named‑field substitution) from affecting those 
parts.

---

** 9. Missing =$>>= etc. column‑relative references

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| =$>>= means “third column from the right” (manual: “you can use =$>>>= to 
indicate the third column from the right”). Original 
=org-table-formula-handle-first/last-rc= handles =$<= and =$>= but not =$>>=. | 
=org-table-loeb--expand-first-last-refs= only expands =$<= and =$>= 
(single‑character). Multiple =<= or =>= are supported for rows (=@<<=) but not 
for columns. | Formulas using =$>>>= will not expand, causing syntax errors. |

*Code location:* =org-table-loeb--expand-first-last-refs= regex 
=\\([@$]\\)\\(<+\\|>+\\)= works for both row and column, but the column count 
uses =ncols= incorrectly: =nmax= is set to =ncols=, but =$>>= should refer to 
column =ncols‑len+1=. John’s code uses =(- nmax len -1)= which is correct for 
rows, but for columns =nmax= is the total number of columns, not the last 
column index? Actually column numbers are 1‑based, so =$>= should be column 
=ncols=. =$>>= should be column =ncols‑1=. The formula =(- nmax len -1)= yields 
=ncols‑len+1=. For =len=2=, =ncols=5=, result = =5‑2+1= = 4, which is correct 
(second‑to‑last). However, the manual says =$>>>= is third from right; with 
=len=3=, result = =5‑3+1= = 3, which is third column from left, not right. The 
manual’s example is ambiguous. The original implementation likely treats =$>>= 
as second‑to‑last, but I would need to verify. This is a potential discrepancy.

---

** 10. Cycle detection may miss self‑references via =$C= when =C= equals 
current column

| Manual / Original behavior | John’s implementation | Impact |
|----------------------------|------------------------|--------|
| A formula =$1== referencing its own column in the same row creates a 
self‑cycle. Original detects this as a circular dependency (though it may only 
manifest as infinite recursion). | Because =$C= is not parsed, the dependency 
graph will not contain the self‑edge. The cycle detector will not report it, 
and the loeb engine may loop indefinitely or produce a wrong result. | Silent 
incorrect evaluation instead of a clear error. |

---

** Summary of high‑priority issues

1. *Missing column/row reference parsing* – breaks dependency analysis and 
cycle detection.
2. *=$0= not expanded* – formulas using current‑column reference will fail.
3. *Lisp‑formula flags =N=/=L= ignored* – breaks many Lisp formulas.
4. *Duration flags ignored* – duration calculations will be wrong.
5. *Range expansion missing for Lisp formulas* – ranges in Lisp formulas cause 
errors.

These are likely to cause user‑visible bugs and regression compared to the 
classic engine. The remaining issues (hline semantics, empty‑cell handling, 
preprocessing order) are subtler and may not affect typical usage, but could 
still produce different results in edge cases.

---

*** Challenge Layer

- *Assumption:* The loeb engine’s dependency analysis is only needed for cycle 
detection, and missing edges are not critical because the closure will still 
force the correct cells during evaluation.  
  /Challenge:/ If a cell’s closure references another cell that itself depends 
on the first via a path not captured in the dependency graph, the loeb 
fixed‑point combinator may still converge, but the order of evaluation could be 
wrong (e.g., a cell may be forced before its dependencies are ready). This 
could lead to incorrect results, not just undetected cycles.

- *Assumption:* Expanding remote references early (before named‑field 
substitution) is safe because remote tables are external constants.  
  /Challenge:/ If a remote reference contains a named column (e.g., 
=remote(table, $name)=), early expansion will replace =$name= with its value 
from the /local/ table’s column names, which may be wrong. The original 
processes remote references after named‑field substitution, preserving the 
intended semantics.

- *Risk of over‑expansion:* John’s code expands hline and relative references 
before range substitution, converting =@I..@II= to =@2..@3=. This is fine, but 
if the same formula also contains a range like =@I$1..@II$2=, the regex for 
ranges may no longer match after hline expansion, causing the range to be 
missed. The original keeps hline references inside the range regex and resolves 
them later, avoiding this problem.

- *Second‑order effect:* The loeb engine is intended to be a drop‑in 
replacement for the classic engine. If it behaves differently in subtle ways 
(e.g., empty‑cell handling, duration conversion), users who rely on those 
features will see broken tables, and the bug reports will be hard to trace back 
to the engine switch.


-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to