This is an automated email from the ASF dual-hosted git repository.

leginee pushed a commit to branch bazel-migration
in repository https://gitbox.apache.org/repos/asf/openoffice.git

commit 8cae16e13e49a4423286743774d19ba9d1248314
Author: Peter Kovacs <[email protected]>
AuthorDate: Mon Jun 15 23:25:09 2026 +0200

    document code bug: Calc crash-on-open AV — latent UAF, 
debug-CRT-deterministic
---
 bug-readme.md | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 84 insertions(+), 14 deletions(-)

diff --git a/bug-readme.md b/bug-readme.md
index 91a77a071a..080ccc27fa 100644
--- a/bug-readme.md
+++ b/bug-readme.md
@@ -612,7 +612,7 @@ defined in `fundamental.ini`, so the imported values 
expand. Rejected alternativ
 (`DISABLE_EXTENSION_SYNCHRONIZATION=1`) — keeps sync working for future 
extension modules.
 **Verified:** fatal gone; office boots into documents.
 
-## 14. Calc crash-on-open AV — debug-CRT-only latent UAF (NOT a migration bug)
+## 14. Calc crash-on-open AV — latent UAF, debug-CRT-deterministic (NOT a 
migration bug)
 
 **Component:** `sc` view init — **stock AOO defect**, source byte-identical to 
upstream.
 **Root cause (CONFIRMED):** `ScViewData::ReadUserDataSequence`
@@ -624,21 +624,91 @@ refreshes `pThisTab`** (which pointed at 
`pTabData[nTabNo]`). Back in
 `sc!ScTabView::SetTabNo` `mov ecx,[eax+edx*4+0x664]` with **edx=0xDDDDDDDD**
 (`pGridWin[0xdddddddd]`).
 
-**Why only in this build:** debug-CRT artifact, *not* a regression. Release 
allocator reuses
-the just-freed same-size block, so the immediate `new` returns the **same** 
address and
-`pThisTab` stays valid. The debug CRT (`MSVCR90D`) poison-fills freed blocks 
with `0xDD`
-and **delays** their reuse, so `new` returns a *different* block → `pThisTab` 
dangles.
+**Why it shows here but not in normal use — and why "release is fine" is too 
strong.**
+This is a genuine latent **UAF** (reading a dangling pointer is UB); it is 
*masked* in
+release by **unspecified allocator behaviour**, not made correct. The `delete` 
and `new`
+are same-thread, adjacent, same size class, with nothing between them (the
+`ScViewDataTable` ctor runs *after* `operator new`). Mainstream allocators 
keep per-size
+free lists with MRU/LIFO reuse, so `new` almost always returns the 
**just-freed block** —
+and when it does, `pThisTab == pTabData[nTabNo]` again and points at the 
*live, freshly
+constructed* object (genuinely correct, not stale-luck). That is why it has 
survived ~20
+years in the field.
+
+But it is **not guaranteed**, and can surface non-deterministically even in 
release:
+- Windows **LFH randomizes** allocation slots (Win8+ exploit mitigation), so 
`new` may
+  return a *different* slot. In release the freed block isn't poisoned, so the 
immediate
+  read usually still sees plausible old bytes — but the window here is **not** 
two
+  instructions: the rest of `ReadUserDataSequence` parses *all other sheets* 
(lots of
+  allocation), any of which can reclaim that block and overwrite the 
`eWhichActive` offset
+  with a large value → `pGridWin[garbage]` → a **rare, timing-dependent 
release crash**.
+- Hardened/diagnostic allocators (PageHeap, Application Verifier, ASan, a 
custom
+  `operator new`) don't do MRU reuse at all → they crash release too.
+- The debug CRT (`MSVCR90D`) is just the *deterministic* trigger: it 
poison-fills freed
+  blocks with `0xDD` and **delays** reuse, so `new` returns a different block 
every time.
+
+> NOT a factor: **multiple apps in parallel.** Each process has its own heap 
and address
+> space; another `soffice.exe`'s allocations can't touch this process's free 
list, and ASLR
+> randomizes base addresses, not intra-heap reuse. The variable is *this* 
process's
+> allocator policy + the interleaving inside the parse loop.
 
 cdb proof (`.frame 0; dv /t` + `?? &this->aViewData`): `eOldActive = 
0xDDDDDDDD`, but
 `this`/`pDoc`/`pViewShell` all valid, and crucially
 `pThisTab = 0x09a5f118` (old/low region, freed) **≠** `pTabData[nTabNo=0] = 
0x13deca08`
 (fresh/valid) — the new table object is fine; only the stale pointer dangles. 
**Contrast
-with §1 (ICU):** there, release *also* failed (missing data = real staging 
gap); here
-release opens Calc fine.
-
-**Disposition:** out of migration scope ("source is not changed"). The 1-line 
upstream fix
-would be `pThisTab = pTabData[nTabNo];` at the end of `ReadUserDataSequence`. 
**Triage rule:**
-a debug-build `0xDD`/`0xFEEE` AV is a migration bug only if an upstream 
*failure* feeds it
-(a throw, a missing staged file/data, as in §1); if every object is valid and 
only one
-pointer dangles across a `delete`/`new`, it's a debug-CRT-exposed latent UAF → 
confirm by
-opening the same document in the **release** build. Do feature testing in 
release.
+with §1 (ICU):** there the root was missing *data* (a real staging gap) and 
release failed
+deterministically; here the source is byte-identical to upstream and release 
fails only
+rarely (allocator-dependent), which is why it's not a *migration* regression.
+
+**Disposition:** out of migration scope ("source is not changed"), but "masked 
by the
+allocator" ≠ "correct" — it's a latent UAF whose release manifestation is rare 
and
+non-deterministic (the worst kind: invisible in testing, occasionally lethal 
in the field),
+so the 1-line upstream fix is warranted: `pThisTab = pTabData[nTabNo];` at the 
end of
+`ReadUserDataSequence` 
([viewdata.cxx:2955-2960](main/sc/source/ui/view/viewdata.cxx#L2955)),
+mirroring `ScViewData::SetTabNo` line 1502 — it removes the dependence on 
allocator luck
+entirely. **Triage rule:** a debug-build `0xDD`/`0xFEEE` AV is a *migration* 
bug only if an
+upstream *failure* feeds it (a throw, a missing staged file/data, as in §1); 
if every object
+is valid and only one pointer dangles across a `delete`/`new`, it's a 
debug-CRT-exposed
+latent UAF. Reproducing in release tells you whether the allocator is masking 
it — a *clean*
+release run is **not** proof of correctness, only that MRU reuse held that 
time.
+
+### 14.1 Two fixes — the patch vs. the correct model
+
+**(a) Minimal coherence patch (what to do under the build-only rule).** 
Restore the broken
+invariant at the one site that violated it — end of `ReadUserDataSequence`
+([viewdata.cxx:2955-2960](main/sc/source/ui/view/viewdata.cxx#L2955)), before
+`pDoc->SetViewOptions(...)`:
+```cpp
+    // pTabData[nTabNo] was delete'd + re-new'd in the loop above; pThisTab is 
stale.
+    pThisTab = pTabData[nTabNo];     // same line ScViewData::SetTabNo (1502) 
already does
+```
+One line; fixes the symptom. But it leaves the underlying hazard in place: the 
invariant
+`pThisTab == pTabData[nTabNo]` is still maintained *by convention* at every 
site that
+touches `nTabNo` or `pTabData[]` — the next such site can forget again.
+
+**(b) Correct model — don't cache a cheap-to-derive value; encapsulate it.** 
The real defect
+is that `pThisTab` is a **stored cache** of `pTabData[nTabNo]` whose coherence 
is the
+caller's job. `nTabNo` is already the source of truth; the cached pointer only 
saves one
+indexed load. Remove the member and derive it, so the invariant is true *by 
construction*
+and can never desync:
+```cpp
+private:
+    ScViewDataTable* ThisTab() const { return pTabData[nTabNo]; }   // always 
correct, by construction
+```
+Then every `pThisTab->x` becomes `ThisTab()->x` (e.g. `GetActivePart()` →
+`return ThisTab()->eWhichActive;`). There is no stored state to refresh, so no 
site can ever
+leave it stale — the whole class of bug is gone, not just this instance. Cost 
is one extra
+indexed load per access (the accessor inlines to the same `base + 
nTabNo*sizeof(ptr)` the
+cache was avoiding) — negligible, and on modern CPUs often *faster* than 
chasing a pointer
+into possibly-cold memory. If profiling ever shows a hot path, cache it in a 
**local** for
+that loop's duration (`ScViewDataTable* t = ThisTab();`), which has zero 
lifetime coupling —
+never as a long-lived member.
+
+*(Variant, if the per-tab object were expensive to re-derive: keep the member 
but make
+`nTabNo`/`pTabData[]` private and route ALL mutation through one 
`ReplaceTabData(n, p)` /
+`SetActiveTab(n)` that refreshes the cache — single enforced update point. For 
a plain array
+index that's overkill; option (b) is the right shape here.)*
+
+The principle generalizes: **a cache must be an encapsulated implementation 
detail with a
+single enforced point of update — not an invariant the type trusts every 
caller to
+preserve.** Same lesson the vcl-lifetime-redesign track is applying in `vcl`; 
this is the
+`sc` instance of it.

Reply via email to