Hi everyone, This week marks a critical turning point in our Object Browser investigation. We've completed a comprehensive four-phase analysis that has fundamentally changed our understanding of the crash patterns. While we've made significant progress in stabilizing the IDE, we've also uncovered a new crash scenario that requires immediate attention.
*Gerrit Patches* *Patch 28 (Week 10): Thread-Safe Initialization System* https://gerrit.libreoffice.org/c/core/+/186822/28 *The Problem: Slaying the Initialization Hydra **a Multi-Headed Beast* Our previous initialization system was fundamentally flawed. Multiple threads could trigger initialization simultaneously, creating a race condition that manifested as: info:basctl:96852:430736002:basctl/source/basicide/idedataprovider.cxx:60: UnoHierarchyInitThread starting info:basctl:96852:430736003:basctl/source/basicide/idedataprovider.cxx:60: UnoHierarchyInitThread starting info:basctl:96852:430736014:basctl/source/basicide/idedataprovider.cxx:60: UnoHierarchyInitThread starting *This chaotic initialization caused:* - Severe performance degradation (6+ second startup times) - Resource conflicts between competing threads - IDE freezing during startup *The Solution: A Coordinated State Machine* We implemented a sophisticated thread-safe initialization system using modern C++ concurrency primitives: // New Architecture: Double-Checked Locking Pattern enum class InitState { NotInitialized, Initializing, Initialized, Failed, Disposed }; void ObjectBrowser::Initialize() { // Fast lock-free check first InitState currentState = m_eInitState.load(); if (currentState == InitState::Initialized || currentState == InitState::Initializing) return; // Acquire lock for definitive check std::unique_lock<std::mutex> lock(m_InitMutex); currentState = m_eInitState.load(); if (currentState == InitState::Initialized || currentState == InitState::Initializing) return; // Set state while holding lock, then release for long operation m_eInitState.store(InitState::Initializing); lock.unlock(); // ... safe initialization ... } void IdeDataProvider::AsyncInitialize(...) { // Atomic compare-and-swap ensures single initialization if (!m_bInitializationInProgress.compare_exchange_strong(expected, true)) return; // Only first thread succeeds } *The Result: Order from Chaos* After Patch 28 - Clean, Sequential Initialization: info:basctl:79942:495124713:basctl/source/basicide/idedataprovider.cxx:60: UnoHierarchyInitThread starting info:basctl:79942:495124973:basctl/source/basicide/idedataprovider.cxx:141: UnoHierarchyInitThread completed in 1162 ms Performance transformation: - 80% reduction in initialization time (6+ seconds → ~1.2 seconds) - Single, controlled initialization thread - Eliminated resource conflicts and race conditions - ------------------------------------------------------------------------------- *Patch 29 (Week** 10-11): Deadlock Fix in Data Provider Callback* https://gerrit.libreoffice.org/c/core/+/186822/29 *Thread 0x1f10d90f (Main Thread):* basctl::ObjectBrowser::RefreshUI(bool) basctl::IdeDataProvider::GetTopLevelNodes() std::__1::lock_guardstd::__1::mutex::lock_guard * Thread 0x1f1147f9 (Background Thread):* basctl::IdeDataProvider::UnoHierarchyInitThread::run() basctl::ScriptDocument::getLibraryNames() basic::SfxLibraryContainer::getElementNames() basic::SfxLibraryContainer::enterMethod() comphelper::SolarMutex::acquire() *Problem: Subtle deadlock in IdeDataProvider::AsyncInitialize* Solution: Improved thread synchronization with atomic compare-and-swap: void basctl::IdeDataProvider::AsyncInitialize(...) { m_pThreadController = pController; bool expected = false; // Atomic compare-and-swap ensures only one thread starts initialization if (!m_bInitializationInProgress.compare_exchange_strong(expected, true)) { // If already completed, call callback immediately if (m_bInitialized) Application::PostUserEvent(rFinishCallback); return; } // Create and start initialization thread } - ------------------------------------------------------------------------------- *Patch 30 (Week 11): Enhanced Disposal Order* https://gerrit.libreoffice.org/c/core/+/186822/30 *Problem: TaskPanelList registration failures and disposal order* Solution: Comprehensive disposal sequence: void ObjectBrowser::dispose() { // 1: Atomic Guard to prevent re-entry bool expected = false; if (!m_bDisposed.compare_exchange_strong(expected, true)) return; // 2: Check parent hierarchy validity if (!GetParent() || !GetParent()->GetSystemWindow()) { // Minimal cleanup if parent is gone DockingWindow::dispose(); return; } // 3: Remove pending events and hide EnableInput(false); Hide(); Application::RemoveMouseAndKeyEvents(this); Application::Reschedule(true); // 4: Update state machine m_eInitState.store(InitState::Disposed); m_InitCV.notify_all(); // 5: Unregister from TaskPanelList BEFORE widget disposal if (GetParent() && GetParent()->GetSystemWindow()) { TaskPaneList* pTaskPaneList = GetParent()->GetSystemWindow() ->GetTaskPaneList(); if (pTaskPaneList) pTaskPaneList->RemoveWindow(this); } // 6: Comprehensive cleanup of all resources // ... widget disposal, thread cancellation, etc. ... } - ------------------------------------------------------------------------------- - ------------------------------------------------------------------------------- *The Four Investigation Phases: A Systematic Approach* *I. Investigation Phase 1: Initial Disposal Fixes* *The Problem:* We began with a classic use-after-free crash in the macOS event system: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x18be51388 __pthread_kill + 8 1 libvclplug_osxlo.dylib 0x1208e44d4 *-[SalFrameView mouseDown:] + 76* *Root Cause Analysis:* The crash occurred because mouse events were being delivered to a disposed window object. macOS Cocoa maintains references to view objects even after logical disposal, creating a race condition between VCL's disposal and Cocoa's event delivery. *Initial Fix Attempts:* - Added atomic disposal flag (m_bDisposed) to prevent re-entry - Added EnableInput(false) and Hide() calls - Added Application::RemoveMouseAndKeyEvents(this) - Added event handler disconnection *Result:* Crash persisted with the same signature, indicating deeper issues. *Diagram:* Initial Disposal Problem VCL Disposal Cocoa Event System ----------- ---------------- | dispose() | ----> | Event Queue | ----------- ---------------- | | v v | Object freed | | Events still | | (C++) | | referencing | | disposed obj | ---------------- - ------------------------------------------------------------------------------- *II. Investigation Phase 2: Deep Analysis and Pattern Recognition* *The Breakthrough:* Research into LibreOffice's VCL architecture revealed critical patterns: 1. Frame Tracking System: *AquaSalFrame* objects are registered when created and deregistered when destroyed. 2. Frame Validity Checking: The system uses AquaSalFrame::isAlive() checks throughout the codebase. 3. Standard Disposal Pattern: Other VCL components follow a specific disposal sequence. *Key Discovery:* Parent-Child Window Relationship Issue The real problem wasn't just in the ObjectBrowser, but in the entire window hierarchy: 1. Basic Macros dialog (parent) opens 2. Basic Macros dialog opens IDE (child) 3. IDE creates ObjectBrowser (grandchild) 4. User closes IDE - ObjectBrowser disposed 5. Critical Issue: ObjectBrowser not properly removed from VCL frame tracking 6. User closes Basic Macros dialog (parent) 7. CRASH: Basic Macros dialog tries to access dangling frame references *Diagram: Window Hierarchy Problem* [Basic Macros Dialog] (Parent) | v [IDE Window] (Child) | v [ObjectBrowser] (Grandchild) <-- Disposed but not deregistered | v [VCL Frame Tracking] <-- Still has reference to disposed ObjectBrowser +---------------------------+ +---------------------------+ | Main UI Thread | | Background Thread | |---------------------------| |---------------------------| | 1. Acquires SolarMutex | | 3. Needs to get macros | | (for UI/doc access) | | | | | | 4. Tries to acquire | | 2. Waits for Background | <---- | SolarMutex (BLOCKED!) | | Thread to finish. | ----> | | | (BLOCKED!) | | 5. Waits for Main Thread | | | | to release lock. | +---------------------------+ +---------------------------+``` **The Fix (Patch 29):** We re-architected the data loading. The background thread was simplified to *only* perform the thread-safe task of building the UNO API cache. The non-thread-safe work of querying Basic macros was moved to the `OnDataProviderInitialized` handler, which executes safely on the main thread. This successfully resolved the deadlock. // -- PATCH 29: DEADLOCK FIX -- IMPL_LINK(ObjectBrowser, OnDataProviderInitialized, void*, /* p */, void) { // This now runs safely on the Main Thread // 1. Create a list of top-level nodes... // 2. Add UNO APIs node (data is ready from background)... // 3. Add Application/Document Macros (safe to query now)... // 4. Set complete data and build indexes ONCE... // 5. Refresh UI... } - ------------------------------------------------------------------------------- *III. Investigation Phase 3: Mouse Drag Timer Crash* *New Crash Pattern:* The problem evolved from simple use-after-free to memory corruption: Thread 0 Crashed:: ... 6 libvclplug_osxlo.dylib *-[SalFrameView mouseDraggedWithTimer:] + 100* 7 Foundation __NSFireTimer + 104 ... *Exception Type:* EXC_BAD_ACCESS (SIGSEGV) *Exception Codes: *KERN_INVALID_ADDRESS at 0x00001cb5987c0bde *Root Cause:* Incomplete cleanup of macOS-specific resources, particularly timer-based mouse drag operations. *Evidence from Code:* The SalFrameView creates NSTimers for mouse drag events: -(void)mouseDragged: (NSEvent*)pEvent { [self clearPendingMouseDraggedEvent]; mpPendingMouseDraggedEvent = [pEvent retain]; if ( !mpMouseDraggedTimer ) { mpMouseDraggedTimer = [NSTimer scheduledTimerWithTimeInterval:0.025f target:self selector:@selector(mouseDraggedWithTimer:) userInfo:nil repeats:YES]; } } These timers can fire after disposal, accessing freed memory. *Diagram: Timer-Based Crash(My Understanding)* Mouse Drag Event | v [NSTimer Created] --> [SalFrameView] | | v v [0.025s Interval] [ObjectBrowser Disposed] | | v v [Timer Fires] -------> [CRASH: Accessing freed memory] - ------------------------------------------------------------------------------- *IV. Investigation Phase 4: Delayed Disposal Experiment* *The Hypothesis:* Perhaps the issue was timing-related. If we delayed disposal, maybe the Cocoa event system would have time to clean up. *Implementation:* We implemented a delayed disposal pattern: void ObjectBrowser::dispose() { // Immediate cleanup Hide(); Application::PostUserEvent(LINK(this, ObjectBrowser, DelayedDisposeHandler)); } IMPL_LINK(ObjectBrowser, DelayedDisposeHandler, void*, /*p*/, void) { // Actual disposal happens later performActualDisposal(); } *Expected Outcome:* - Prevent mouse event crashes by allowing proper cleanup - Maintain VCL frame tracking - Allow clean IDE closure *Actual Result: New Problems Introduced* 1. Mouse Event Crash Fixed: Original crashes no longer occurred 2. New Problem: UI Artifacts and Freezing - ObjectBrowser disappeared but left visual artifact - IDE became unresponsive, showing <NO_MODULE> - Document Recovery UI appeared - IDE reloaded instead of closing cleanly *Root Cause:* VCL Synchronous vs. Asynchronous Conflict VCL expects immediate disposal: Parent Layout -> Child Disposal -> Layout Update Delayed disposal broke this contract: Parent Layout -> Child Scheduled for Disposal -> Layout Updates (but child still exists) *Diagram: Delayed Disposal Conflict* VCL Expectation: [Layout] -> [Dispose Child] -> [Update Layout] Delayed Disposal Reality: [Layout] -> [Schedule Dispose] -> [Update Layout] | v [Child still exists] | v [<NO_MODULE> displayed] - ------------------------------------------------------------------------------- *New Discovery: Ghost Parent UI Crash* *The Scenario:* We identified two distinct workflows with different outcomes: *Scenario A (Works Correctly):* - Context: A document (Calc) is open - Action: Tools > Macros > Edit - Result: BASIC Macros dialog closes, IDE opens cleanly - When closing IDE, everything shuts down properly *Scenario B (Crashes):* - Context: No document is open (global soffice context) - Action: Tools > Macros > Edit - Result: IDE opens, but BASIC Macros dialog remains open in background - Clicking on this "ghost parent" window causes the mouseDown crash *ASCII Diagram: Ghost Parent Problem* With Document Context: Without Document Context: [Calc Document] [No Document] | | v v [BASIC Macros] [BASIC Macros] <-- Ghost Parent | | v v [IDE Opens] [IDE Opens] | | v v [Parent Closes] [Parent Stays] | v [Click on Parent] | v [CRASH] *Root Cause Analysis:* The issue is that the BASIC Macros dialog is being put into a "zombie" state when no document is open. It's not properly closed when the IDE opens, leaving it in the window hierarchy with an inconsistent state. *// In sfx2/source/appl/appserv.cxx* #if HAVE_FEATURE_SCRIPTING case SID_BASICIDE_APPEAR: { } and case SID_MACROORGANIZER: { SAL_INFO("sfx.appl", "handling SID_MACROORGANIZER"); const SfxItemSet* pArgs = rReq.GetArgs(); sal_Int16 nTabId = 0; Reference <XFrame> xFrame; if (pArgs) { if (const SfxUInt16Item* pItem = pArgs->GetItemIfSet(SID_MACROORGANIZER, false)) nTabId = pItem->GetValue(); if (const SfxBoolItem* pItem = rReq.GetArg<SfxBoolItem>(FN_ PARAM_2)) { // if set then default to showing the macros of the document associated // with this frame if (pItem->GetValue()) xFrame = GetRequestFrame(rReq); } } SfxApplication::MacroOrganizer(rReq.GetFrameWeld(), xFrame, nTabId); rReq.Done(); } break; *// In vcl/osx/salframeview.mm <http://salframeview.mm>* -(void)mouseDown: (NSEvent*)pEvent { if ( mpMouseEventListener != nil && [mpMouseEventListener respondsToSelector: @selector(mouseDown:)]) { [mpMouseEventListener mouseDown: pEvent]; } s_nLastButton = MOUSE_LEFT; [self sendMouseEventToFrame:pEvent button:MOUSE_LEFT eventtype:SalEvent::MouseButtonDown]; } *Current Status & Next Steps* Successfully Resolved: 1. IDE Shutdown Crashes - Eliminated through proper disposal order (Patch 30) 2. Multiple Initialization Threads - Solved with performance gains (Patch 28) 3. Deadlock in Data Provider - Fixed with proper callback handling (Patch 29) *Critical Issues Remaining:* 1. Ghost Parent UI Crash (NEW CRITICAL PRIORITY) - Occurs when opening IDE without a document context - Clicking on the lingering BASIC Macros dialog causes crash - Requires immediate investigation and fix 2. Mouse Event Crashes Post-Disposal (HIGH PRIORITY) 3. History Navigation Failures (MEDIUM PRIORITY) - Back/forward buttons become disabled after first use - History system doesn't preserve full UI state 4. Breaking down this large single patch in multiple chronological patches. 5. In Right Pane when we double click on an element it should open the API Page window. 3. Adding a small delay in search and make search results better & include Macros results too. - ------------------------------------------------------------------------------- *Next Steps for Week 12:* Priority 1: Ghost Parent UI Investigation - Determine why BASIC Macros dialog doesn't close in no-document context - Implement proper dialog closure when IDE opens from global context - Test both document and no-document scenarios Priority 2: Enhanced Event Handler Cleanup - Review all event connections in ObjectBrowser - Ensure complete disconnection in disposal method - Add frame validity checks to SalFrameView mouse handlers Priority 3: Patch Breakdown Strategy - Break large patches into smaller, focused changes - Enable incremental review and testing by community - ------------------------------------------------------------------------------- *Technical Evolution: Lessons Learned* 1. Disposal Order is Critical The sequence of operations in dispose() matters immensely. TaskPanelList removal must happen early. 2. Thread Safety Requires Multiple Layers Single boolean flags are insufficient. Need atomic operations, mutexes, and condition variables. 3. macOS Event System is Complex Timer-based events can outlive object disposal. Need comprehensive cleanup of all native resources. 4. Context Matters The same action can have different results depending on application state (document vs. no-document context). - ------------------------------------------------------------------------------- *Conclusion* Our four-phase investigation has provided deep insights into the complex interactions between VCL and macOS. While we've made significant progress in stabilizing the Object Browser, the discovery of the ghost parent UI crash shows that there are still fundamental issues to resolve. The architectural improvements in thread safety and disposal management provide a strong foundation, but we must now address the window lifecycle management issues that cause the ghost parent problem. Thanks to mentors for their invaluable guidance throughout this complex investigation. The point is the crash which we are seeing is not happening after patch ~26 is with the BASIC IDE rather it is happening with the IDEs parent BASIC Macro UI Window if I am opening it via the main soffice(LibreOfficeDev) UI Here is the video link to understand better what is exactly going on - https://www.youtube.com/watch?v=gTwWkYQKLxk I do was having this thought about this ghost parent Ui window remaining that it wasn't used to there when I started working on after opening the IDE but now I am assured that the IDE with the New OB is closing well and now we can break this patch chronologically so that others in community can test it :) and can do the remaining UI/UX polish and the major part which is code suggestions can be done quickly. *Previous Updates:* Week 1: https://lists.freedesktop.org/archives/libreoffice/2025-May/093264.html Weeks 2-3: https://lists.freedesktop.org/archives/libreoffice/2025-June/093362.html Week 4: https://lists.freedesktop.org/archives/libreoffice/2025-June/093392.html Week 5: https://lists.freedesktop.org/archives/libreoffice/2025-June/093443.html Week 6: https://lists.freedesktop.org/archives/libreoffice/2025-July/093493.html Week 7: https://lists.freedesktop.org/archives/libreoffice/2025-July/093527.html Week 8: https://lists.freedesktop.org/archives/libreoffice/2025-July/093572.html Week 9-10: https://lists.freedesktop.org/archives/libreoffice/2025-August/093662.html If there is any mistake or something I missed in my understanding do let me know :) -- *Regards,* *Devansh*