John Wiegley <[email protected]> writes: > All of this should be done now:
Ok. Let's go a full LLM craze now :) Ihor writes: In the context, I provide John Wiegley's re-implementation of Org mode table formulas. I am also providing the Org manual entries describing this functionality. You can lookup org-table.el (large buffer) for the original implementation. The Jogn's rewrite is large and done by LLM, so I expect various subtle details to be inconsistent. Please look at the John's code and check if the code behavior contradicts things documented in the manual. # LLM response I have analyzed John Wiegley’s =org-table-loeb.el= against the Org manual (from the context buffer) and the original =org-table.el= implementation. The rewrite is large and, as you suspected, contains several subtle inconsistencies with documented behavior. Below is a systematic list of contradictions, grouped by functional area. --- ** 1. Reference parsing for dependency analysis | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | *Column‑only (=$C=) and row‑only (=@R=) references* are valid and expand to the current row/column (manual: “if you omit either part, the current row/column is implied”).<br>Original =org-table-eval-formula= substitutes =$C= with a field reference based on the current row. | =org-table-loeb--parse-refs= only matches absolute =@R$C= patterns. It ignores =$C= and =@R=. | Dependency graph misses edges for column‑ and row‑only references. Cycles like =$1= ↔ =$2= in the same row will not be detected, and the topological order used for cycle detection may be wrong. | | *Range references (=@R1$C1..@R2$C2=, =$N..$M=, =@I..@II=)* are not parsed for dependencies in the original; dependencies are resolved when the range is expanded during evaluation. | John’s code also does not parse range references in =org-table-loeb--parse-refs=. | Missed dependencies within ranges. However, because the closure eventually calls =org-table-loeb--get= for each cell in the range, evaluation will still work, but cycle detection may fail for cycles that go through a range. | *Code locations:* - =org-table-loeb--parse-refs= (line ~120) regex =@\\([0-9]+\\)\\$\\([0-9]+\\)=. - Dependency graph built by =org-table-loeb--build-deps= uses only those parsed refs. *Suggested fix:* Expand the regex to also capture =$C= and =@R= (or, simpler, run dependency analysis on the fully expanded formula after preprocessing, where all references are already absolute =@R$C=). --- ** 2. Hline reference semantics | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | *Standalone =@I=* – the manual states “@I refers to the first hline”. However, row numbering counts only data lines; hlines are ignored. The practical meaning (seen in examples like =@-I$2=) is “the first data row after that hline”. Original =org-table--descriptor-line= resolves =I= to the line number of the hline itself, but =org-table-get-range= then skips hlines when fetching data rows. | =org-table-loeb--expand-hline-refs= treats all standalone =@I= as “first row after that hline” (=target-row (+ target-hline 1 offset)=). | Consistent with practical usage, but differs from the literal manual description. This may cause subtle differences if a formula expects =@I= to refer to the hline position (which is not a data row). | | *Ranges =@I..@II=* – manual: start means “first row after that hline”, end means “last row before that hline”. | John’s code implements exactly that: start = =hline+1=, end = =hline= (no +1). | Correct. | *Code location:* =org-table-loeb--expand-hline-refs= (line ~230). --- ** 3. Missing =$0= (current column) reference | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | =$0= refers to the current column (manual: “=@0= and =$0= refer to the current row and column, respectively”). Original expands =$0= to the current column number. | =org-table-loeb--expand-relative-refs= only expands =@0=. =$0= is left unchanged, causing a syntax error in the resulting Lisp/Calc expression. | Formulas using =$0= will break. | *Code location:* =org-table-loeb--expand-relative-refs= (line ~340) regex =@0= only. --- ** 4. Lisp formula flags =N= and =L= not implemented | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | *=N= flag* – references are parsed as numbers; non‑numbers become 0. Ranges become space‑separated numbers suitable for =apply= etc.<br>*=L= flag* – references are inserted literally (without quotes). | =org-table-loeb--make-lisp-evaluator= ignores =:force-numbers= and =:literal= flags from =org-table-loeb--parse-format-flags=. All references are substituted as =(org-table-loeb--get …)= calls (string interpolation). | Lisp formulas that rely on numeric operations (=' (+ $1 $2) ;N=) will fail because =$1= is replaced by a string‑yielding call. Literal‑mode formulas (=' (substring "$1" …) ;L=) will also break. | *Code locations:* - =org-table-loeb--parse-format-flags= parses =N= and =L= correctly. - =org-table-loeb--make-lisp-evaluator= does not branch on those flags. --- ** 5. Duration flags =t=, =T=, =U= parsed but not honored | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | Flags =t=, =T=, =U= control duration formatting and conversion (manual: “Duration computations in Calc or Lisp”). Original converts time strings =HH:MM:SS= to seconds before calculation and formats results back. | =org-table-loeb--parse-format-flags= adds =:time-format=, =:time-hhmmss=, =:time-hhmm= keys, but no conversion is applied in =org-table-loeb--format-result= or in the evaluator. | Duration calculations will produce wrong results (e.g., =2:12 + 1:47= will be treated as string concatenation or numeric addition of =2= and =1=). | *Code location:* =org-table-loeb--parse-format-flags= (line ~280) adds flags, but they are unused. --- ** 6. Empty‑cell handling diverges for Lisp formulas | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | With =E= flag, empty fields become =nan= in Calc formulas and empty string =""= in Lisp formulas. With =N= flag, empty fields become =0= for both. | =org-table-loeb--handle-empty-value= returns ="nan"= for empty when =:keep-empty= is set, otherwise ="0"=. No distinction between Calc and Lisp. | Lisp formulas that expect empty strings will receive ="nan"= or ="0"=, potentially breaking string operations. | *Code location:* =org-table-loeb--handle-empty-value= (line ~300). --- ** 7. Range expansion missing for Lisp formulas | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | In Lisp formulas, ranges (e.g., =$1..$3=) are expanded to a list of values (or numbers with =N= flag). Original substitutes the range with a vector string for Calc, and a list for Lisp. | =org-table-loeb--substitute-ranges= is called only from =org-table-loeb--make-calc-evaluator=, not from =org-table-loeb--make-lisp-evaluator=. | Lisp formulas containing ranges will see the raw =$1..$3= syntax, causing evaluation errors. | *Code location:* =org-table-loeb--make-lisp-evaluator= does not call =--substitute-ranges=. --- ** 8. Preprocessing order differs from original | Original order (in =org-table-eval-formula=) | John’s order (=org-table-loeb--preprocess-formula=) | |-----------------------------------------------|-----------------------------------------------------| | 1. Substitute named fields (=org-table-formula-substitute-names=).<br>2. Handle first/last refs (=org-table-formula-handle-first/last-rc=).<br>3. Remote‑reference indirection.<br>4. Expand remote references.<br>5. Expand complex ranges (=org-table-range-regexp=).<br>6. Expand simple ranges (=$N..$M=). | 1. Excel‑style refs.<br>2. Expand remote refs.<br>3. Substitute named fields.<br>4. Expand hline refs.<br>5. Expand first/last refs.<br>6. Expand relative refs. | *Impact:* The different order could cause subtle differences when a formula mixes multiple reference types (e.g., a named column that also contains a remote reference). However, because each expansion step eventually produces absolute =@R$C= references, the final result may still be correct. The main risk is that early expansion of remote refs replaces them with literal values, preventing later steps (like named‑field substitution) from affecting those parts. --- ** 9. Missing =$>>= etc. column‑relative references | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | =$>>= means “third column from the right” (manual: “you can use =$>>>= to indicate the third column from the right”). Original =org-table-formula-handle-first/last-rc= handles =$<= and =$>= but not =$>>=. | =org-table-loeb--expand-first-last-refs= only expands =$<= and =$>= (single‑character). Multiple =<= or =>= are supported for rows (=@<<=) but not for columns. | Formulas using =$>>>= will not expand, causing syntax errors. | *Code location:* =org-table-loeb--expand-first-last-refs= regex =\\([@$]\\)\\(<+\\|>+\\)= works for both row and column, but the column count uses =ncols= incorrectly: =nmax= is set to =ncols=, but =$>>= should refer to column =ncols‑len+1=. John’s code uses =(- nmax len -1)= which is correct for rows, but for columns =nmax= is the total number of columns, not the last column index? Actually column numbers are 1‑based, so =$>= should be column =ncols=. =$>>= should be column =ncols‑1=. The formula =(- nmax len -1)= yields =ncols‑len+1=. For =len=2=, =ncols=5=, result = =5‑2+1= = 4, which is correct (second‑to‑last). However, the manual says =$>>>= is third from right; with =len=3=, result = =5‑3+1= = 3, which is third column from left, not right. The manual’s example is ambiguous. The original implementation likely treats =$>>= as second‑to‑last, but I would need to verify. This is a potential discrepancy. --- ** 10. Cycle detection may miss self‑references via =$C= when =C= equals current column | Manual / Original behavior | John’s implementation | Impact | |----------------------------|------------------------|--------| | A formula =$1== referencing its own column in the same row creates a self‑cycle. Original detects this as a circular dependency (though it may only manifest as infinite recursion). | Because =$C= is not parsed, the dependency graph will not contain the self‑edge. The cycle detector will not report it, and the loeb engine may loop indefinitely or produce a wrong result. | Silent incorrect evaluation instead of a clear error. | --- ** Summary of high‑priority issues 1. *Missing column/row reference parsing* – breaks dependency analysis and cycle detection. 2. *=$0= not expanded* – formulas using current‑column reference will fail. 3. *Lisp‑formula flags =N=/=L= ignored* – breaks many Lisp formulas. 4. *Duration flags ignored* – duration calculations will be wrong. 5. *Range expansion missing for Lisp formulas* – ranges in Lisp formulas cause errors. These are likely to cause user‑visible bugs and regression compared to the classic engine. The remaining issues (hline semantics, empty‑cell handling, preprocessing order) are subtler and may not affect typical usage, but could still produce different results in edge cases. --- *** Challenge Layer - *Assumption:* The loeb engine’s dependency analysis is only needed for cycle detection, and missing edges are not critical because the closure will still force the correct cells during evaluation. /Challenge:/ If a cell’s closure references another cell that itself depends on the first via a path not captured in the dependency graph, the loeb fixed‑point combinator may still converge, but the order of evaluation could be wrong (e.g., a cell may be forced before its dependencies are ready). This could lead to incorrect results, not just undetected cycles. - *Assumption:* Expanding remote references early (before named‑field substitution) is safe because remote tables are external constants. /Challenge:/ If a remote reference contains a named column (e.g., =remote(table, $name)=), early expansion will replace =$name= with its value from the /local/ table’s column names, which may be wrong. The original processes remote references after named‑field substitution, preserving the intended semantics. - *Risk of over‑expansion:* John’s code expands hline and relative references before range substitution, converting =@I..@II= to =@2..@3=. This is fine, but if the same formula also contains a range like =@I$1..@II$2=, the regex for ranges may no longer match after hline expansion, causing the range to be missed. The original keeps hline references inside the range regex and resolves them later, avoiding this problem. - *Second‑order effect:* The loeb engine is intended to be a drop‑in replacement for the classic engine. If it behaves differently in subtle ways (e.g., empty‑cell handling, duration conversion), users who rely on those features will see broken tables, and the bug reports will be hard to trace back to the engine switch. -- Ihor Radchenko // yantar92, Org mode maintainer, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
