This is an automated email from the ASF dual-hosted git repository. leginee pushed a commit to branch bazel-migration in repository https://gitbox.apache.org/repos/asf/openoffice.git
commit 75ffacfae45f3a4c82802ebea72fdbebb4833d54 Author: Peter Kovacs <[email protected]> AuthorDate: Thu Jun 11 08:50:37 2026 +0200 fixed ICU the data object has been delivered as a stub, which caused icu to fail. --- bug-readme.md | 503 ++++++++++++++------- .../modules/icu/49.1.2/overlay/BUILD.bazel | 56 ++- main/postprocess/BUILD.bazel | 29 +- 3 files changed, 410 insertions(+), 178 deletions(-) diff --git a/bug-readme.md b/bug-readme.md index 6742bdd031..7082d8bec6 100644 --- a/bug-readme.md +++ b/bug-readme.md @@ -1,33 +1,85 @@ -# Bug report — Use-after-free in sfx2 Sidebar on document-frame startup +# Bug report — Writer document fails to survive load at startup; office self-terminates, and queued async events then crash (use-after-free) -**Component:** `sfx2` (Sidebar) / `vcl` (Window) -**Crash site:** `vcl!Window::ImplInsertWindow` — [main/vcl/source/window/window.cxx:1048](main/vcl/source/window/window.cxx#L1048) -**Type:** Use-after-free (dangling VCL frame window), timing-dependent -**Severity:** High in debug builds (all document editing dead); latent/masked in release builds -**Status:** Root-caused; workaround available; fix is out of scope for the Bazel migration (no source changes) +**Component:** `ext_libraries/icu` data packaging (true root cause); `i18npool` break iterator (throw site); `desktop`/`sfx2`/`vcl` (downstream self-terminate + crash sites) +**Root cause (CONFIRMED):** the staged `icudata.dll` is ICU's **stub** (empty) and the real `icudt49l.dat` is never staged/loaded, so ICU has **no break-iterator data**. The first text layout during the Writer load calls `BreakIterator_Unicode::loadICUBreakIterator`, ICU's `createLineInstance` fails for lack of data, and it `throw RuntimeException()` (empty message). That escapes the document load → the frame is closed → `DispatchWatcher` sees no open frame → `Desktop::terminate()` → `Sf [...] +**Type:** Missing ICU runtime data → load failure → startup self-termination → use-after-free (async timing decides *which* AV you see) +**Severity:** High — Writer never opens; office exits or AV-crashes on every launch. +**Status:** **Root-caused and confirmed** (cdb throw-stack walk + ICU build inspection). Fix is **staging-only, ICU unchanged**: stage the prebuilt `icudt49l.dat` and point ICU at it via `ICU_DATA` (§9). The sidebar-hidden workaround (§8) only suppresses one downstream AV signature and is now secondary. --- ## 1. Summary -When a document view frame (Writer/Calc/Impress/…) is created at startup, the sfx2 -Sidebar's `SidebarController` posts an **asynchronous** `UpdateConfigurations` call -(via `Application::PostUserEvent`). That call later runs `TabBar::SetDecks`, which -constructs the deck tab buttons (`ImageRadioButton`s). During construction, -`Window::ImplInsertWindow` dereferences the parent TabBar's **frame window** -(`mpWindowImpl->mpFrameWindow`) — but by the time the async call fires, that -**document frame window has already been destroyed**. The result is a read from freed -heap memory. - -- In a **debug** build the freed memory is poison-filled, so it faults immediately - (clean AV under a debugger) or corrupts the window tree and **deadlocks** when run - normally (symptom: "can't click anything, must hard-kill"). -- In a **release** build the same latent bug is normally masked by timing (the - reference installed OpenOffice 4 does not exhibit it). - -This is a **latent lifetime race in AOO**, *exposed* by the debug build's timing and -heap poisoning — not introduced by the Bazel build, and not a missing/misconfigured -staged file. +**The full chain, innermost cause first:** + +1. **ICU has no runtime data.** The staged `icudata.dll` is built from `stubdata.c` + ([ext_libraries icu overlay/BUILD.bazel:277-278](ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel#L278)) + — an empty placeholder — and the real `icudt49l.dat` is neither baked into it nor staged, and nothing + calls `udata_setCommonData`/`u_setDataDirectory`. So ICU's `brkitr` (break-iterator) data is unavailable. +2. **The Writer load needs a line break iterator.** During the load, the first real text layout calls + `BreakIterator_Unicode::loadICUBreakIterator` + ([breakiterator_unicode.cxx:95](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L95)). + The custom `OpenOffice_dat` package has no `line` rule (intentionally omitted — see §5.1), so it falls + back to ICU's own `icu::BreakIterator::createLineInstance` ([line 178](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L178)), + which **also** fails for lack of data → `throw RuntimeException()` (empty message) + ([line 183](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L183)). +3. **The load aborts and closes the document.** That exception escapes `SfxFrameLoader_Impl::load`'s + `catch` ([frmload.cxx:632](main/sfx2/source/view/frmload.cxx#L632)); with `bLoadSuccess == false` the + loader closes the model ([frmload.cxx:645](main/sfx2/source/view/frmload.cxx#L645)) — **no frame survives.** +4. **The office self-terminates,** and posted async events race the teardown and AV (the originally-reported + crashes). Mechanism below. + +Why the office *starts* fine but Writer dies: AOO uses its own `localedata_*.dll` for locale formatting +(not ICU data), and break iterators load **on demand** +([line 94](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L94)) — Start Center never needs +word/line/sentence breaking, so the empty ICU data is first hit only when a document lays out text. + +### The self-terminate + async-AV mechanism (downstream of the load failure) + +`soffice -writer` dispatches `private:factory/swriter` **synchronously** +([dispatchwatcher.cxx:333](main/desktop/source/app/dispatchwatcher.cxx#L333)). When the dispatch +returns (with the doc already closed by step 3), +[DispatchWatcher::executeDispatchRequests](main/desktop/source/app/dispatchwatcher.cxx#L432-L453) +asks the Desktop for its open frames — **the list is empty** — and therefore shuts the office down: + +```cpp +bool bEmpty = (m_nRequestCount == 0); +if ( bEmpty && !bNoTerminate ) +{ + Reference< XFramesSupplier > xTasksSupplier( xDesktop, UNO_QUERY ); + Reference< XElementAccess > xList( xTasksSupplier->getFrames(), UNO_QUERY ); + if ( !xList->hasElements() ) // ← NO open frames + { + Reference< XDesktop > xDesktop2( xTasksSupplier, UNO_QUERY ); + if ( xDesktop2.is() ) + return xDesktop2->terminate(); // ← the teardown + } +} +``` + +So **the Writer document does not survive its own load** — no frame is left for the office to show. + +`Desktop::terminate()` runs the SFX terminate listener, whose `notifyTermination` does +`delete pApp` ([appinit.cxx:148](main/sfx2/source/appl/appinit.cxx#L148)) → `~SfxApplication` +([app.cxx:379-383](main/sfx2/source/appl/app.cxx#L379)) → `Deinitialize()` +([appquit.cxx:90](main/sfx2/source/appl/appquit.cxx#L90)) → `delete pSlotPool`, whose destructor +`delete`s every `SfxInterface` ([msgpool.cxx:91-92](main/sfx2/source/control/msgpool.cxx#L91)). The +process-static `Class::pInterface` pointers behind the `SFX_IMPL_INTERFACE` macro are **left dangling** +(never reset to 0). + +The office is mid-`Application::Yield`, and one or more **already-posted async user-events** still sit +in the VCL event queue. They fire in the same loop and **race the teardown** — that is where the visible +crash comes from. Which crash you get depends on what was queued: + +| Build | Queued event that fires after terminate | Crash signature | +|---|---|---| +| sidebar **on** | `SidebarController::UpdateConfigurations` (posted) | **Sig. B** — `vcl!Window::ImplInsertWindow` AV: sidebar TabBar parents a tab button into the **freed frame window** | +| sidebar **off** | `SfxDispatcher::DispatcherUpdate_Impl` (posted via `svtools::AsynchronLink`) | **Sig. A** — `sfx!SfxInterface::Register` AV: `SfxGetpApp()->GetOrCreate()` **re-creates** `SfxApplication`; `GetStaticInterface()` returns the **dangling** static interface | + +**Both AVs are the same defect**, separated only by which posted event wins the race after the office +has decided to terminate. Hiding the sidebar (§8) removed event B but exposed event A; it did **not** +make Writer open. This supersedes the earlier framing of this file (which treated the sidebar UAF as +*the* bug — it is a symptom). --- @@ -38,216 +90,335 @@ staged file. | Product | Apache OpenOffice (Bazel-migration branch `bazel-migration`) | | OS | Windows 11 (10.0.26200), x86 / 32-bit process | | Toolchain | MSVC VS2008, **debug CRT** (`MSVCR90D.dll`, `MSVCP90D.dll`) | -| Entry binary | `soffice.exe` (7,680-byte loader) → `sofficeapp.dll` (in-process) | -| Reference | Installed **OpenOffice 4** (release build) — works correctly | -| Debugger | cdb / WinDbg 10.0.26100.7705; TTD (tttracer) | +| Entry binary | `soffice.exe` (loader) → `sofficeapp.dll` (in-process) | +| Reference | Installed **OpenOffice 4** (release build) — opens Writer correctly | +| Debugger | cdb / WinDbg 10.0.26100.7705 | --- ## 3. Steps to reproduce ``` -soffice.exe -env:UserInstallation=file:///C:/temp/ooo_nosync4 -norestore -writer +soffice.exe -env:UserInstallation=file:///C:/temp/ooo_nosyncN -norestore -writer ``` -(fresh user profile, no extensions). The Start Center appears fine; opening any -document frame triggers the failure. +Fresh user profile, no extensions. Start Center is unaffected (it never dispatches a document factory and +has no sfx2 Sidebar). The failure is specific to opening a document frame. ### Observed -- Document window paints (chrome/toolbars visible) but **does not respond to mouse or - keyboard**; process must be hard-killed. -- **Start Center is unaffected** — it has no sfx2 Sidebar, so it never runs this path. +- Office **self-terminates** during startup (no Writer window), or AVs in cdb (sig. A or B). ### Expected -- Document frame is fully interactive. +- A blank, interactive Writer document window that stays open. --- -## 4. Symptom matrix (why it looks different under different tools) +## 4. Symptom matrix (one defect, several faces) | How run | Result | Why | |---|---|---| -| Debug build, normal launch | **Hang** → hard-kill | Dangling read returns stale data → corrupt window tree → deadlock | -| Debug build under cdb | **Access violation** in `ImplInsertWindow` | Debug heap poisons freed memory → immediate fault | -| Debug build under cdb + full page heap | AV at freed page (decommitted) | Page heap confirms the block is freed | -| Debug build under **TTD** | **No crash** — `exit code 0` after 151 s | TTD perturbs timing → the race is not lost | -| Release build (reference install) | **Works** | Timing masks the latent race | +| sidebar on, normal launch | Hang → hard-kill | Posted `UpdateConfigurations` reads freed frame → corrupt window tree → deadlock | +| sidebar on, under cdb | AV in `vcl!Window::ImplInsertWindow` (sig. B) | Debug heap poisons freed frame → immediate fault | +| sidebar off, under cdb | AV in `sfx!SfxInterface::Register` (sig. A) | Posted `DispatcherUpdate` re-creates `SfxApplication`; dangling static `SfxInterface` | +| sidebar off, normal launch | Office exits with no window | `getFrames()` empty → `Desktop::terminate()`; the async race may or may not crash first | +| under **TTD** | exit code 0 after ~150 s | TTD perturbs timing so a posted event runs while state is still alive; masks the race, not the root | +| installed **OO4** reference | Works | Different binary; its `swriter` load produces a surviving frame | -The nondeterminism across runs (see §6) is the defining evidence that this is a -**timing race**, not a deterministic build/data defect. +The poison value seen at the AV varies by run (`0xFEEEFEEE` OS-heap free-fill vs `0xDDDDDDDD` debug-CRT +`delete`-fill) — consistent with reading **freed** memory whose exact filler depends on the allocator path. --- ## 5. Root cause -### 5.1 The faulting code -[main/vcl/source/window/window.cxx:1039-1051](main/vcl/source/window/window.cxx#L1039-L1051): +### 5.1 The disease: ICU has no break-iterator data → the line break iterator throws + +ICU's data is missing at runtime: + +- `icudata.dll` is built from `source/stubdata/stubdata.c` + ([icu overlay/BUILD.bazel:277-294](ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel#L276)) — ICU's + **stub** (zero resources). The real `icudt49l.dat` exists in the tree + (`source/data/in/icudt49l.dat`, filegroup `icudt49l_dat`) but is used only as a **build-tool input** + (`-d` for `genbrk`), never baked into the DLL or staged. +- Nothing makes the data available at runtime: a grep of `main/staging` finds no `icudt49l.dat`, no + `ICU_DATA`, no `udata_setCommonData`; AOO never calls `u_setDataDirectory` (it normally relies on the + *real* data DLL's entry point, which here is the stub). + +So when the Writer load triggers the first text layout, `BreakIterator_Unicode::loadICUBreakIterator` +([breakiterator_unicode.cxx:95-194](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L95)) +runs for `LOAD_LINE_BREAKITERATOR`: + +- The custom `OpenOffice_dat` package (compiled into `i18npool` from the `*.txt` rules) has **no `line` + rule** — `line.txt`/`sent.txt` were deliberately omitted in the migration + ([i18npool/pool/BUILD.bazel:86-90](main/i18npool/pool/BUILD.bazel#L86)) on the assumption that ICU's own + line break would serve as a fallback. +- The code *does* have that fallback — `icu::BreakIterator::createLineInstance` + ([line 178](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L178)) — **but it also fails** + (`!U_SUCCESS(status)`) because ICU has no `brkitr` data, so the function does + `throw RuntimeException()` ([line 183](main/i18npool/source/breakiterator/breakiterator_unicode.cxx#L183)). + `#define ERROR ::com::sun::star::uno::RuntimeException()` — an **empty-message** RuntimeException, + exactly the type/message we decoded at the loader's catch. + +So the omission of `line.txt` is harmless *on its own*; the killer is that the ICU fallback it relies on +has no data either. Either source of data (custom OOo `line` rule **or** ICU `brkitr` data) would prevent +the throw; the migration removed the first and never wired up the second. + +**Confirmed**, not inferred: the captured fatal throw stack is +`i18npool!BreakIterator_Unicode::loadICUBreakIterator+0x87e` ← `getLineBreak` ← `OutputDevice::ImplGetTextLines` +← `MessBox` layout ← `ErrorHandler::HandleError` ← `sb!SfxLibraryContainer::init` (Appendix A.4), and it is +the **last first-chance C++ exception before `SfxFrameLoader_Impl::load`'s catch** (`getCaughtException` +at `load+0xa7e`). Every other throw in the trace (encryption check, basic-library probing, ucb folder +probing) is caught internally and benign. + +### 5.1a The load aborts and closes the document + +`SfxFrameLoader_Impl::load` ([frmload.cxx:577-653](main/sfx2/source/view/frmload.cxx#L577)) wraps view +creation + `connectController` in a `try`; the RuntimeException escapes `impl_createDocumentView` (which +has no internal `catch`), so `bLoadSuccess` stays false, the `catch` runs +`impl_handleCaughtError_nothrow`, and then `xCloseable->close(sal_True)` closes the model +([frmload.cxx:640-645](main/sfx2/source/view/frmload.cxx#L645)). `SwXTextDocument::close` → +`SfxBaseModel::close` tears the frame down → `getFrames()` is now empty. + +`sw.dll`, `svx.dll`, `swd.dll` all load and the SwView is built before this — the frame is **created then +disposed**, which is why the earlier traces looked like "frame torn down during construction." + +### 5.2 The teardown path (terminate → freed SFX statics) -```cpp -void Window::ImplInsertWindow( Window* pParent ) -{ - mpWindowImpl->mpParent = pParent; - mpWindowImpl->mpRealParent = pParent; - if ( pParent && !mpWindowImpl->mbFrame ) - { - Window* pFrameParent = pParent->mpWindowImpl->mpFrameWindow; // 1047 - mpWindowImpl->mpFrameData = pFrameParent->mpWindowImpl->mpFrameData; // 1048 <-- AV - ... +``` +sfx!SfxTerminateListener_Impl::notifyTermination ; delete pApp (appinit.cxx:148) +fwk!framework::Desktop::terminate +sofficeapp!desktop::DispatchWatcher::executeDispatchRequests ; getFrames() empty → terminate +sofficeapp!desktop::OfficeIPCThread::ExecuteCmdLineRequests +sofficeapp!desktop::Desktop::OpenDefault +sofficeapp!desktop::Desktop::OpenClients +... vcl!Application::Yield ... ; first message-loop yield +``` + +`delete pApp` → `~SfxApplication` → `Deinitialize()` → `delete pSlotPool` → +`SfxSlotPool::~SfxSlotPool` runs `for (pIF = FirstInterface(); pIF; ...) delete pIF;` +([msgpool.cxx:91-92](main/sfx2/source/control/msgpool.cxx#L91)). Each `~SfxInterface` +([objface.cxx:304-317](main/sfx2/source/control/objface.cxx#L304)) does `delete pImpData` (debug-CRT +poison-fills it `0xDD`). `~SfxApplication` sets `pApp = 0`. **But the per-class process-static +`Class::pInterface` is never reset** — `~SfxInterface` has no access to it; the static lives in the +`SFX_IMPL_INTERFACE` macro ([shell.hxx:341-358](main/sfx2/inc/sfx2/shell.hxx#L341)). + +### 5.3 Signature A — `SfxInterface::Register` AV (sidebar off) + +A `DispatcherUpdate_Impl` posted earlier via `svtools::AsynchronLink` fires in the same `Yield`, +**after** `pApp` was deleted: + +``` +sfx!SfxInterface::Register+0xd ; mov byte ptr [ecx+3Ch],1 ; ecx = pImpData = 0xDDDDDDDD +sfx!SfxApplication::RegisterInterface +sfx!SfxApplication::Registrations_Impl +sfx!SfxApplication::Initialize_Impl +sfx!SfxApplication::GetOrCreate ; pApp == 0 → builds a NEW SfxApplication +sfx!SfxGetpApp +sfx!SfxDispatcher::Update_Impl +sfx!DispatcherUpdate_Impl ; posted async user-event +svt!svtools::AsynchronLink::Call_Impl +vcl!ImplHandleUserEvent ... vcl!Application::Execute ``` -- `pParent` is the **TabBar** — alive and valid (`mbInDtor == 0`). -- `pFrameParent` = `pParent->mpWindowImpl->mpFrameWindow` = the document's **top-level - frame window** — **freed**. `pFrameParent->mpWindowImpl` reads back as the heap - poison value, and the next dereference faults. +`GetOrCreate()` ([app.cxx:287](main/sfx2/source/appl/app.cxx#L287)) sees `pApp == 0` and builds a fresh +`SfxApplication` → `Initialize_Impl` → `Registrations_Impl` → `SfxApplication::RegisterInterface()` → +`GetStaticInterface()`. Because `SfxApplication::pInterface` is **non-NULL but dangling**, +`GetStaticInterface()` returns it *without* re-allocating +([shell.hxx:345-358](main/sfx2/inc/sfx2/shell.hxx#L345)); `Register()` then writes +`pImpData->bRegistered = sal_True` ([objface.cxx:147-150](main/sfx2/source/control/objface.cxx#L147)) +through the freed impl → AV, `ecx = 0xDDDDDDDD`. -### 5.2 The path that gets there (cdb `kb`, abridged) +> The `SFX_IMPL_INTERFACE` static-interface design assumes `SfxApplication` is a true **process singleton +> that is never destroyed and recreated**. A first-ever clean `GetOrCreate` allocates fresh impls and does +> not crash; the AV is therefore positive proof that a **destroy-then-recreate** of `SfxApplication` +> happened — i.e. that §5.1's terminate ran. + +### 5.4 Signature B — `vcl!Window::ImplInsertWindow` AV (sidebar on) + +With the sidebar visible, the queued event is `SidebarController::UpdateConfigurations`, which builds deck +tab buttons (`ImageRadioButton`s). `Window::ImplInsertWindow` +([window.cxx:1039-1051](main/vcl/source/window/window.cxx#L1039)) dereferences the parent TabBar's +**frame window** (`mpWindowImpl->mpFrameWindow`) — already freed by the same terminate → AV. ``` -vcl!Window::ImplInsertWindow <-- deref freed frame window -vcl!Window::ImplInit +vcl!Window::ImplInsertWindow ; deref freed frame window vcl!RadioButton::ImplInit -vcl!RadioButton::RadioButton -vcl!ImageRadioButton::ImageRadioButton sfx!sfx2::sidebar::TabItem::TabItem sfx!sfx2::sidebar::ControlFactory::CreateTabItem -sfx!sfx2::sidebar::TabBar::CreateTabItem sfx!sfx2::sidebar::TabBar::SetDecks sfx!sfx2::sidebar::SidebarController::UpdateConfigurations -sfx!...boost bind... -sfx!sfx2::sidebar::AsynchronousCall::HandleUserCall <-- *** posted user event *** -tl!Link::Call -vcl!ImplHandleUserEvent -vcl!ImplWindowFrameProc -... -vcl!Application::Execute -sofficeapp!desktop::Desktop::Main -``` - -### 5.3 The lifetime bug - -The async plumbing itself is **correct**: -- `AsynchronousCall::~AsynchronousCall()` → `CancelRequest()` → `RemoveUserEvent()` - ([main/sfx2/source/sidebar/AsynchronousCall.cxx:49-86](main/sfx2/source/sidebar/AsynchronousCall.cxx#L49-L86)). -- The controller owns the call as a member - ([main/sfx2/source/sidebar/SidebarController.cxx:125](main/sfx2/source/sidebar/SidebarController.cxx#L125)). - -So a pending call *is* cancelled when the controller is disposed. The defect is **one -level up**: the ref-counted `SidebarController` and its `TabBar` (a VCL window) -**outlive the document frame window that hosts them**. The TabBar survives with a -dangling `mpFrameWindow`, the controller is *not* disposed when the frame is -destroyed, and so its already-posted `UpdateConfigurations` fires against the dead -frame. - -This is consistent with the Start-Center → first-document frame transition during -startup, where an interim frame window is torn down while the sidebar controller it -spawned (and its posted call) survive. +sfx!sfx2::sidebar::AsynchronousCall::HandleUserCall ; posted async user-event +``` --- -## 6. Evidence this is a latent race, not a build/staging defect +## 6. Evidence -1. **Nondeterministic poison value** — the freed frame window read back as - `0xFEEEFEEE` (OS-heap free fill) in one run and `0xDDDDDDDD` (debug-CRT `delete` - fill) in another. A deterministic defect (e.g. a missing component) fails - identically every run. -2. **No repro under TTD** — recording with tttracer changed timing enough that the - process ran to a clean `exit code 0`. The bug is suppressed by instrumentation - (Heisenbug). -3. **Reference release build works** — same source, release timing wins the race. -4. **Config default is correct** (see §7) — the sidebar is *supposed* to be visible. +**The root is a staging gap (ICU data):** +1. `icudata.dll` links `stubdata.c` ([§5.1](#51-the-disease-icu-has-no-break-iterator-data--the-line-break-iterator-throws)); no `icudt49l.dat`/`ICU_DATA`/`udata_setCommonData` anywhere in staging or source. +2. The captured fatal throw is `loadICUBreakIterator` for the **line** break (Appendix A.4) — the on-demand + data consumer — which is why **Start Center works but Writer dies** (break iterators load lazily; + Start Center needs none). +3. The caught exception is a `com.sun.star.uno.RuntimeException` with an **empty message** — exactly + `#define ERROR RuntimeException()` at the throw site, decoded from the `Any` at the loader's catch. + +**The AV crashes themselves are a latent race downstream of the load failure** (so they vary run-to-run): +4. **The terminate is deliberate and reached cleanly** — captured `kb` shows `Desktop::terminate` from + `DispatchWatcher::executeDispatchRequests` after a `getFrames()` check, no exception unwinding. +5. **Sig. A proves a destroy+recreate of `SfxApplication`** (§5.3) — impossible on a clean first init; + only the terminate path frees those statics. +6. **Nondeterministic poison value** (`0xFEEEFEEE` vs `0xDDDDDDDD`) and **no repro under TTD** — classic + freed-memory race (Heisenbug); the *root* (missing data) is deterministic, the *AV* is the race. --- ## 7. Ruled out: sidebar-visibility config mismatch -Checked whether the staging wrongly enables the sidebar. It does not — the default is -the standard upstream value in -[main/officecfg/registry/data/org/openoffice/Office/Views.xcu:33-38](main/officecfg/registry/data/org/openoffice/Office/Views.xcu#L33-L38): +The staging does not wrongly enable the sidebar — the default is the standard upstream value in +[Views.xcu:33-38](main/officecfg/registry/data/org/openoffice/Office/Views.xcu#L33-L38) +(`SID_SIDEBAR` = 10336, `Visible=true`), identical to the reference release. Not a staging gap. (The +sidebar is merely *one* async event that races the teardown — see §8.) -```xml -<!-- show Sidebar child window by default - oor:name == SID_SIDEBAR --> -<node oor:name="10336" oor:op="replace"> - <prop oor:name="Visible" oor:type="xs:boolean"><value>true</value></prop> -</node> -``` +--- -`SID_SIDEBAR` (= 10336) is `Visible=true` by default in both the migration build and -the reference release. Not a staging gap. +## 8. Workaround status — sidebar-hidden is NOT sufficient ---- +### 8.1 What the sidebar-hidden patch does (and doesn't) -## 8. Workaround (no rebuild) +The build-layer patch flips `SID_SIDEBAR` (10336) `Visible` true→false at stage time so the +`SidebarController` (and its async `UpdateConfigurations`) is never constructed: -Force the sidebar window hidden at startup so the `SidebarController` (and its async -call) is never created. Add to the user profile's -`<UserInstallation>/user/registrymodifications.xcu`: +- [main/postprocess/BUILD.bazel](main/postprocess/BUILD.bazel): genrule `sidebar_off_views_xcu` rewrites + `Office/Views.xcu`; `main_xcd` packs the patched copy. Mirrors `forcedefault_linguistic_xcu`. -```xml -<?xml version="1.0" encoding="UTF-8"?> -<oor:items xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> - <item oor:path="/org.openoffice.Office.Views/Windows/org.openoffice.Office.Views:WindowType['10336']"> - <prop oor:name="Visible" oor:op="fuse"> - <value>false</value> - </prop> - </item> -</oor:items> -``` +This removes **signature B** only. Because the underlying defect (§5.1, ICU has no break data → the Writer +load throws and closes the doc) is unchanged, the office **still self-terminates**, and the queued +`DispatcherUpdate` now produces **signature A** instead. **Net: hiding the sidebar does not yield an +interactive Writer — only the ICU-data fix (§9) does.** Keep the patch (it removes one crash and is +harmless), but do not treat it as the fix. + +### 8.2 Use a fresh `UserInstallation` -(Path syntax verified against a `registrymodifications.xcu` written by the app itself.) +`main.xcd` defaults only seed a *new* profile, and a fresh path drops stale `.lock` / +`registrymodifications.xcu` from prior hung runs. Do not hand-edit +`registrymodifications.xcu` (PowerShell writes UTF-16 LE BOM, which configmgr rejects as a profile-access +error). --- -## 9. Suggested next steps +## 9. Fix — build `icudata.dll` with the REAL data (no stub) — IMPLEMENTED + +Chosen approach: stop shipping a stub `icudata.dll`; bake the prebuilt `icudt49l.dat` into it so ICU loads +data from the DLL's entry point exactly as it does in a normal ICU build. No `ICU_DATA` env var, no +loose data file, no source change. Implemented entirely in the ICU overlay BUILD — +[ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel](ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel): + +1. **`genrule :icudt49_dat_obj`** runs the in-tree `:genccode` tool to convert `icudt49l.dat` → a COFF + object `icudt49_dat.obj`: + ``` + genccode -o -e icudt49 -f icudt49_dat -d $(RULEDIR) source/data/in/icudt49l.dat + ``` + - `-o` emits a COFF object directly (`pkg_genc.h` defines `CAN_GENERATE_OBJECTS`); with no `-m` + match-arch file, `getArchitecture()` defaults to `IMAGE_FILE_MACHINE_I386` — correct for the x86 build. + - `-e icudt49` → symbol `icudt49_dat` = `U_DEF2_ICUDATA_ENTRY_POINT(49,…)` = `U_ICUDATA_ENTRY_POINT` + (the same symbol the stub exported). + - `-f icudt49_dat` → predictable output name for the genrule `outs`. +2. **`cc_binary :icudata`** now links that object instead of `stubdata.c`, as a pure data DLL: + `linkshared` + `/NOENTRY` (no DllMain/CRT, like ICU's own `icudt49.dll`) + + `/EXPORT:icudt49_dat` (a hand-built COFF object has no `__declspec(dllexport)`, so the export is explicit). + `icuuc.dll` keeps importing `icudt49_dat` via `:icudata_implib` — unchanged, just real content now. + +The build **tools** (`genbrk`/`gencmn`/`genccode`) keep using the stub `:icudata_static` — they only need +enough ICU to run, and `icudt49_dat.obj` depends on `:genccode`, so the runtime data DLL must not feed back +into the tools (no cycle). Staging is unchanged: it already stages `:icudata`; that DLL now carries data. + +**Build & verify:** +``` +bazel build //main/staging:install +bazel-bin\main\staging\program\soffice.exe -env:UserInstallation=file:///C:/temp/ooo_icu -norestore -writer +``` +Expected: `loadICUBreakIterator` no longer throws → Writer window opens. -1. **Build/run a release (`-c opt`) staging** — the decisive, apples-to-apples test - against the working reference install. Expected: the document frame opens cleanly, - confirming this is a debug-only timing artifact. -2. If a real fix is ever pursued **upstream** (out of scope for the migration): ensure - the `SidebarController` is disposed — and thus its pending `AsynchronousCall` - cancelled — when its host frame window is destroyed, so a posted - `UpdateConfigurations` cannot run against a freed frame. -3. Track as a **known debug-build limitation**, analogous to the documented - `_HAS_ITERATOR_DEBUGGING` debug-only ABI issue. +**If the link complains** (e.g. about a missing entry point because the toolchain still expects one): drop +`/NOENTRY` and instead add a tiny anchor TU (a one-line `DllMain` `.c`) to `:icudata` `srcs` alongside the +object — the CRT then supplies the entry point and the object supplies the export. Keep `/EXPORT:icudt49_dat` +either way. + +### 9.3 Note the misleading source comment +[i18npool/pool/BUILD.bazel:86-90](main/i18npool/pool/BUILD.bazel#L86) states `line.txt`/`sent.txt` were +omitted because `loadICUBreakIterator` "falls back to ICU's own" — true only when ICU *has* data. With §9's +real-data DLL the comment is accurate again; until then it was the trail that hid this bug. + +### 9.4 Downstream items (now secondary, not required for Writer to open) +Once ICU has data the load succeeds, no frame is closed, no terminate, no AV. The latent lifetime issues +remain *real* but are no longer triggered: (a) `SFX_IMPL_INTERFACE` statics aren't reset on +`SfxApplication` teardown (§5.3); (b) posted async events aren't cancelled on terminate (§5.4/§5.5). These +are source-level hardening, out of migration scope, and only bite if the office ever destroys+recreates +`SfxApplication` again. --- ## Appendix A — raw debugger output -### A.1 First-chance AV (no page heap) -``` -vcl!Window::ImplInsertWindow+0x6f: -8b5208 mov edx,dword ptr [edx+8] ds:002b:feeefef6=???????? -edx=feeefeee -``` - -### A.2 Full page heap — block confirmed freed +### A.1 Sig. A — `SfxInterface::Register` AV (sidebar off), with `kp` ``` -vcl!Window::ImplInsertWindow+0x60: -mov edx,dword ptr [ecx+10Ch] ds:002b:604c0fc4=???????? -ecx=604c0eb8 - -!address 0x604c0eb8 -> - Usage: PageHeap - State: MEM_RESERVE - Protect: PAGE_NOACCESS (i.e. freed/decommitted) +sfx!SfxInterface::Register+0xd ; mov byte ptr [ecx+3Ch],1 ds:002b:ddddde19=?? ; ecx=0xDDDDDDDD +sfx!SfxApplication::RegisterInterface(SfxModule* pMod = 0)+0x13 +sfx!SfxApplication::Registrations_Impl+0xe +sfx!SfxApplication::Initialize_Impl+0x82d +sfx!SfxApplication::GetOrCreate+0xf3 +sfx!SfxGetpApp+0x8 +sfx!SfxDispatcher::Update_Impl(bForce=1)+0x5e +sfx!DispatcherUpdate_Impl+0xd +svt!svtools::AsynchronLink::Call_Impl+0x2a +vcl!ImplHandleUserEvent ... vcl!Application::Execute+0x2b +sofficeapp!desktop::Desktop::Main+0x2225 ``` -### A.3 Second no-page-heap run — different poison +### A.2 The culprit — `Desktop::terminate` from DispatchWatcher (`kb` at `notifyTermination`) ``` -vcl!Window::ImplInsertWindow+0x6f: -8b5208 mov edx,dword ptr [edx+8] ds:002b:dddddde5=???????? -edx=dddddddd +sfx!SfxTerminateListener_Impl::notifyTermination +fwk!framework::Desktop::terminate+0x4c2 +sofficeapp!desktop::DispatchWatcher::executeDispatchRequests+0x1d08 ; getFrames() empty → terminate +sofficeapp!desktop::OfficeIPCThread::ExecuteCmdLineRequests+0x260 +sofficeapp!desktop::Desktop::OpenDefault+0x50b +sofficeapp!desktop::Desktop::OpenClients+0x158c +sofficeapp!desktop::Desktop::OpenClients_Impl+0x4b +... vcl!Application::Yield ... vcl!Application::Execute+0x2b +sofficeapp!desktop::Desktop::Main+0x2225 ``` +(`queryTermination` fires first, then `notifyTermination` → `delete pApp`.) -### A.4 TTD recording +### A.3 Sig. B — `Window::ImplInsertWindow` AV (sidebar on) ``` -soffice.exe(x86) (PID:21652): Process exited with exit code 0 after 151375ms - Full trace dumped to C:\temp\ooo.run +vcl!Window::ImplInsertWindow+0x6f ; mov edx,dword ptr [edx+8] ; edx=0xDDDDDDDD (or 0xFEEEFEEE) +vcl!RadioButton::ImplInit +sfx!sfx2::sidebar::TabItem::TabItem+0x39 +sfx!sfx2::sidebar::ControlFactory::CreateTabItem +sfx!sfx2::sidebar::TabBar::SetDecks +sfx!sfx2::sidebar::SidebarController::UpdateConfigurations +sfx!sfx2::sidebar::AsynchronousCall::HandleUserCall ``` -### A.5 Parent (TabBar) is alive +### A.4 The TRUE root — fatal throw: ICU line break iterator has no data +The last first-chance C++ exception before `SfxFrameLoader_Impl::load`'s catch (`getCaughtException` at +`load+0xa7e`). `throw RuntimeException()` = empty-message — matches the `Any` decoded at the handler. ``` -pParent = 0x08890a20 - mpWindowImpl = 0x08890bd8 (contiguous, freshly allocated) - mbInDtor = 0 (not in destructor) +i18npool!BreakIterator_Unicode::loadICUBreakIterator+0x87e ; throw ERROR (line 183): ICU createLineInstance failed — no data +i18npool!BreakIterator_Unicode::getLineBreak+0xa7 +i18npool!BreakIteratorImpl::getLineBreak+0x8c +vcl!OutputDevice::ImplGetTextLines+0x3cb +vcl!OutputDevice::GetTextRect+0x114 +vcl!MessBox::ImplPosControls+0x669 ; laying out an error message box... +vcl!Dialog::Execute+0x30 +svt!aWndFunc+0x4d8 +tl!ErrorHandler::HandleError+0x48 ; ...shown by Basic library container init +sb!basic::SfxLibraryContainer::init_Impl+0x116f +sb!basic::SfxLibraryContainer::init+0x28 + ... unwinds through SfxObjectShell::CheckSecurityOnLoading_Impl → SfxBaseModel::connectController + → sw!SwXTextDocument::connectController → sfx!SfxFrameLoader_Impl::impl_createDocumentView+0x132 + → sfx!SfxFrameLoader_Impl::load (catch) → close(sal_True) → terminate → §5.3/§5.5 AV ``` +ICU build proof: `icudata` `cc_binary` `srcs = ["source/stubdata/stubdata.c"]` +([icu overlay/BUILD.bazel:277](ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel#L277)); no +`icudt49l.dat` / `ICU_DATA` / `udata_setCommonData` staged or set. diff --git a/ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel b/ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel index 029ef81c7c..8af1ea7457 100644 --- a/ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel +++ b/ext_libraries/modules/icu/49.1.2/overlay/BUILD.bazel @@ -203,6 +203,34 @@ filegroup( visibility = ["//visibility:public"], ) +# ── Real ICU data as a linkable object ──────────────────────────── +# Convert the prebuilt icudt49l.dat into a COFF object so the runtime +# icudata.dll carries the REAL data (break iterators, collation, …) +# instead of the empty stub. Without this, ICU has no brkitr data and +# i18npool's BreakIterator_Unicode::loadICUBreakIterator throws an +# (empty-message) RuntimeException on the first Writer text layout, +# which aborts document load → office self-terminates. See repo +# bug-readme.md. +# +# genccode -o emits a COFF object directly (pkg_genc.h CAN_GENERATE_OBJECTS); +# with no -m match-arch file getArchitecture() defaults to IMAGE_FILE_MACHINE_I386, +# matching the VS2008 x86 build. +# -e icudt49 → C symbol icudt49_dat (== U_ICUDATA_ENTRY_POINT for ICU 49, +# i.e. the same symbol the stub used to provide) +# -f icudt49_dat → output file icudt49_dat.obj (predictable name for outs) +# -d $(RULEDIR) → write into the rule's output dir +genrule( + name = "icudt49_dat_obj", + srcs = ["source/data/in/icudt49l.dat"], + outs = ["icudt49_dat.obj"], + tools = [":genccode"], + cmd_bat = ( + "$(execpath :genccode) -o -e icudt49 -f icudt49_dat " + + "-d $(RULEDIR) $(execpath source/data/in/icudt49l.dat)" + ), + visibility = ["//visibility:public"], +) + # ── ICU shared DLLs (for i18npool.dll / i18nsearch.dll) ────────── # These are the DLLs that ship with OpenOffice and are loaded at runtime. # Consumers get __declspec(dllimport) via ICU's own macros. @@ -273,23 +301,29 @@ filegroup( visibility = ["//visibility:public"], ) +# Runtime ICU data DLL — carries the REAL data (NOT the stub). +# The data object (icudt49_dat.obj) is self-contained: it defines exactly one +# symbol, icudt49_dat, and references nothing. So this is a pure data DLL: +# /NOENTRY — no DllMain/CRT startup needed (matches ICU's own icudt49.dll) +# /EXPORT:icudt49_dat — export the entry point (the stub used U_EXPORT in C +# source; a hand-built COFF object has no dllexport flag, +# so we export it explicitly here). This is what icuuc.dll +# imports via :icudata_implib. +# NOTE: the build tools above (genbrk/gencmn/genccode) keep using the STUB +# :icudata_static — they only need enough ICU to run, and making icudt49_dat.obj +# depends on :genccode, so the runtime data DLL must not feed back into the tools. cc_binary( name = "icudata", - srcs = ["source/stubdata/stubdata.c"], - local_defines = select({ - "@platforms//os:windows": _WIN_DEFINES, - "//conditions:default": [], - }), - copts = select({ - "@platforms//os:windows": _WIN_DLL_COPTS, - "//conditions:default": ["-w"], - }), + srcs = [":icudt49_dat_obj"], linkshared = True, linkopts = select({ - "@platforms//os:windows": ["/MANIFEST:NO"], + "@platforms//os:windows": [ + "/NOENTRY", + "/EXPORT:icudt49_dat", + "/MANIFEST:NO", + ], "//conditions:default": [], }), - deps = [":icuuc_headers"], visibility = ["//visibility:public"], ) diff --git a/main/postprocess/BUILD.bazel b/main/postprocess/BUILD.bazel index 8afa3fde35..d00a874c0d 100644 --- a/main/postprocess/BUILD.bazel +++ b/main/postprocess/BUILD.bazel @@ -929,7 +929,13 @@ _MAIN_XCU = [ oc_xcu("Office/UI/StartModuleCommands.xcu"), oc_xcu("Office/UI/StartModuleWindowState.xcu"), oc_xcu("Office/UI.xcu"), - oc_xcu("Office/Views.xcu"), + # Sidebar hidden by default: ship a stage-time-patched Views.xcu that flips + # SID_SIDEBAR (10336) Visible true->false. This stops the sfx2 + # SidebarController (and its async UpdateConfigurations) from ever being + # constructed on a document frame, which avoids the use-after-free hang + # documented in bug-readme.md (latent AOO lifetime race; not masked by the + # release CRT in this build). See :sidebar_off_views_xcu genrule below. + ":sidebar_off_views_xcu", oc_xcu("Office/WebWizard.xcu"), oc_xcu("Office/Writer.xcu"), oc_xcu("Setup.xcu"), @@ -972,6 +978,27 @@ _MAIN_XCU = [ oc_mod("TypeDetection/UISort.xcu", "writer"), ] +# Stage-time copy of Office/Views.xcu with the Sidebar (SID_SIDEBAR == 10336) +# default flipped from Visible=true to Visible=false. Views.xcu contains exactly +# one <value>true</value> (the sidebar; node 5539 is already false), so a single +# string replace is unambiguous. Output is consumed by main_xcd's packer (not by +# configmgr at runtime), so the UTF-8 BOM that Set-Content emits is harmless here +# — unlike a hand-written profile registrymodifications.xcu, which configmgr +# rejects as a profile access error. Mirrors the forcedefault_linguistic_xcu +# substitution pattern. See bug-readme.md for the underlying Sidebar UAF. +genrule( + name = "sidebar_off_views_xcu", + srcs = [oc_xcu("Office/Views.xcu")], + outs = ["SidebarOff/Office/Views.xcu"], + cmd_bat = ( + "powershell -NoProfile -Command " + + "\"(Get-Content -Raw '$(location " + + oc_xcu("Office/Views.xcu") + + ")').Replace('<value>true</value>', '<value>false</value>') " + + "| Set-Content -Encoding UTF8 '$(OUTS)'\"" + ), +) + pack_registry( name = "main_xcd", out = "main.xcd",
