https://bugs.documentfoundation.org/show_bug.cgi?id=162526

--- Comment #3 from Mike Kaganski <[email protected]> ---
(In reply to Geoff Kuenning from comment #2)
> Well, I guess I should have started with gdb...although the session took
> about 6 hours before I found the underlying cause, but at least it was more
> focused than looking through strace output with over 100K lines or randomly
> updating shared libraries.
> 
> The immediate cause of the crash is a segfault.  That's because in
> ScDocShell::InitNew, in sc/source/ui/docshell/docsh2.cxx, there is this
> innocent-looking bit of code:
> 
>         ScOrcusFilters* pOrcus = ScFormatFilter::Get().GetOrcusFilters();
> 
> The problem is that ScFormatFilter::Get() returns a reference to a
> dynamically loaded object, and in my case that reference is a null pointer. 
> So C++ merrily follows that null pointer, trying to call the virtual
> function GetOrcusFilters, and crashes.

Thank you! That is very useful.

> So: bug #1 is that ScFormatFilter::Get has no protection against a failed
> dynamic load.  I'd guess that the same thing happens in other Libreoffice
> code, although I'm certainly not the one to go hunting.
> 
> Bug #2 is a derivative: when there's a failed dynamic load, dlopen generates
> a nice error message (which I'll get to in a moment), but that message is
> unavailable to the user.  Instead there's just a confusing and very generic
> "I crashed" message caused by the segfault.  The code for Get() in
> sc/source/ui/docshell/impex.cxx contains an assert(false), but that's hardly
> better (and I guess in my distro assertions must have been disabled, boo). 
> I'll agree that most users wouldn't be able to understand a failed-dlopen
> message, but at least they'd have something they could put into a bug report
> or read off to an expert.
> 
> Now, as to the error message: decoding the error structure from dlopen, the
> offending file is /usr/lib64/libreoffice/program/libscfiltlo.so and the
> problem is an undefined symbol,
> _ZN5orcus13create_filterENS_8format_tEPNS_11spreadsheet5iface14import_factory
> E, which demangles to orcus::create_filter(orcus::format_t,
> orcus::spreadsheet::iface::import_factory*).  That symbol doesn't exist in
> any version of liborcus that I have installed on the failing machines (16,
> 17, and 18--don't ask me why I have three variants because I don't know). 
> But sure enough, the scratch VM that works has it defined in
> /usr/lib64/liborcus-0.18.so.0.0.0.
> 
> And armed with that knowledge, I quickly figured out that a simple update
> was needed and oocalc works again!
> 
> But: it's really not OK for C++ functions that return references to return
> null pointers.  That's an absolute no-no.  So please get rid of this code in
> ScFormatFilter::Get():
> 
>         return static_cast<ScFormatFilterPlugin*>(nullptr);
> 
> and replace it with proper error handling.

Please avoid teaching here. We assert the validity of the pointer; the
assert(false) there in the code is for a reason. We consider that situation a
hard failure and a *programming* error, to the extent that a crash is an
appropriate outcome. We need to find out why the pointer is null, not to handle
that.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to