https://bugs.documentfoundation.org/show_bug.cgi?id=103690

--- Comment #10 from nico...@hillegeer.com ---
I'm getting more and more the feeling like this is some weird synchronization
issue.

GOOD: /Applications/LibreOffice.app/Contents/MacOS/soffice
GOOD: lldb /Applications/LibreOffice.app/Contents/MacOS/soffice
BAD:  Open from dock
BAD:  Open from spotlight

So I couldn't debug it while it was in a bad state, I thought. Then I tried the
next best thing: debug it as fast as possible.

  while : ; do if pgrep soffice ; then lldb -p $(pgrep soffice) ; fi ; done

It's crude, but it works. Basically, attach to libreoffice as fast as possible
after startup.

I did:

  lldb> b CreateSalInstance

To see whether it was actually being called (that's the thing that creates the
object that gets assigned to mpDefInst). See core/vcl/source/app/svmain.cxx:

    // Initialize Sal
    pSVData->mpDefInst = CreateSalInstance();
    if ( !pSVData->mpDefInst )
        return false;

It's also one of only 3 lines that modify it:

  $ ag 'mpDefInst ='
  core/vcl/inc/svdata.hxx:312:    SalInstance*            mpDefInst = nullptr; 
          // Default SalInstance
  core/vcl/source/app/svmain.cxx:290:    pSVData->mpDefInst =
CreateSalInstance();
  core/vcl/source/app/svmain.cxx:587:        pSVData->mpDefInst = nullptr;

Anyway, I ran it and it stopped at CreateSalInstance, indicating it was being
called. Continuing the process afterwards made it keep going without a crash.
Rats, I seem to have turned a "bad" invocation into a good one by debugging it.

Next time, I tried putting a breakpoint on DeInitVCL, which is where mpDefInst
gets reset to NULL. However, after setting the breakpoint, the process crashed
with the well known bad access. So not every bad invocation turns good... or
was it because I set the breakpoint on a different function? 

  Process 34503 stopped
  * thread #1: tid = 0x60fec, 0x00007fff9e4e041a
libsystem_kernel.dylib`mach_msg_trap + 10, queue = 'com.apple.main-thread',
stop reason = signal SIGSTOP
      frame #0: 0x00007fff9e4e041a libsystem_kernel.dylib`mach_msg_trap + 10
  libsystem_kernel.dylib`mach_msg_trap:
  ->  0x7fff9e4e041a <+10>: retq
      0x7fff9e4e041b <+11>: nop

  libsystem_kernel.dylib`mach_msg_overwrite_trap:
      0x7fff9e4e041c <+0>:  movq   %rcx, %r10
      0x7fff9e4e041f <+3>:  movl   $0x1000020, %eax          ; imm = 0x1000020

  Executable module set to
"/Applications/LibreOffice.app/Contents/MacOS/soffice".
  Architecture set to: x86_64-apple-macosx.
  (lldb) b DeInitVCL
  Breakpoint 1: where = libvcllo.dylib`DeInitVCL(), address =
0x00000001118e8720
  (lldb) run
  There is a running process, detach from it and restart?: [Y/n] n
  (lldb) cont
  Process 34503 resuming
  (lldb) Traceback (most recent call last):
  Process 34503 stopped
  * thread #1: tid = 0x60fec, 0x00000001118e43cd
libvcllo.dylib`Application::GetSolarMutex() + 13, queue =
'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
      frame #0: 0x00000001118e43cd libvcllo.dylib`Application::GetSolarMutex()
+ 13
  libvcllo.dylib`Application::GetSolarMutex:
  ->  0x1118e43cd <+13>: movq   (%rdi), %rax
      0x1118e43d0 <+16>: popq   %rbp
      0x1118e43d1 <+17>: jmpq   *0xa8(%rax)
      0x1118e43d7 <+23>: nopw   (%rax,%rax)
  (lldb)

Bingo. It's not exactly what I wanted, but at least I can see what's actually
filled in in the ImplSVData struct returned by ImplGetSVData() now. Remembering
that %rax contains a pointer to the ImplSVData.

  (lldb) memory read --size 8 --format x --count 8 $rax
  0x111b5c9d0: 0x00007fb4d7800000 0x0000000000000000
  0x111b5c9e0: 0x00007fff50d67bd0 0x0000000000000000
  0x111b5c9f0: 0x0000000000000000 0x0000000000000000
  0x111b5ca00: 0x0000000000000000 0x0000000000000000

Taking its data layout by hand again:

  struct ImplSVData
  {
      SalData*                mpSalData = nullptr;
      SalInstance*            mpDefInst = nullptr;            // Default
SalInstance
      Application*            mpApp = nullptr;                // pApp
      VclPtr<WorkWindow>      mpDefaultWin;                   // Default-Window
      bool                    mbDeInit = false;               // Is VCL
deinitializing
      // Tons of stuff more.
  }

Which basically tells me everything is NULL/false/... except for mpSalData and
mpApp. Some digging teaches us that mpSalData and mpApp are always set
together, so that makes sense. That's done here:

  Application::Application()
  {
      // useful for themes at least, perhaps extensions too
      OUString aVar("LIBO_VERSION"), aValue(LIBO_VERSION_DOTTED);
      osl_setEnvironment(aVar.pData, aValue.pData);

      ImplGetSVData()->mpApp = this;
      InitSalData();
  }

Apparently, a global variable is set with the object that's being constructed
(->mpApp = this). Neat. InitSalData() does something similar, but for
->mpSalData.

At this point I'm pretty sure that the InitVCL() code ran: 

  bool InitVCL()
  {
      if( pExceptionHandler != nullptr )
          return false;

      EmbeddedFontsHelper::clearTemporaryFontFiles();

      if( !ImplGetSVData()->mpApp )
      {
          pOwnSvApp = new Application();
      }
      InitSalMain();

      ImplSVData* pSVData = ImplGetSVData();

      // remember Main-Thread-Id
      pSVData->mnMainThreadId = ::osl::Thread::getCurrentIdentifier();

      // Initialize Sal
      pSVData->mpDefInst = CreateSalInstance();
      if ( !pSVData->mpDefInst )
          return false;
      // ...

But why is mpDefInst NULL then, in our bad runs? Are two threads racing? One
trying to initialize everything (InitVCL) and the other somewhere calling
GetSolarMutex()? If that's the case, that should be visible in our other thread
stacktraces. It was a good theory, but using "thread backtrace all" tells me
that's not the case. Argh.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to