On 19/10/11 10:17, Dag Wieers wrote:
Hi,

During the course of the LibreOffice conference in Paris, we (the
unoconv and cloudooo projects) found that some of the issues our users
were having while doing document conversions using PyUNO and OpenOffice
and LibreOffice were not related to our own project, but have a
root-cause in either PyUNO or LibreOffice/OpenOffice.

The result of these issues are various and individual:

- segfaults
- various error codes
- PyUNO crashes
- memory leaks
- xslt problems

And while some of them are reproducable (and consistent), others are
not, which makes me believe they are related to internal state or timing
issues of LibreOffice/OpenOffice or related to import/export filters.

Since these issues are very common and can be triggered very quickly, we
would like to have developers look at them to see what is the cause and
how we can fix them.

it is well known that the threading implementation in the OOo applications is rather unreliable.

currently for thread safety the implementers of UNO APIs are required to explicitly use low-level synchronization primitives such as mutexes.

not doing it correctly (such as locking a mutex while it should not be locked, or forgetting to lock a mutex while it should be locked) lead to very subtle problems that do not show up during ordinary office use, and are extremely difficult to reproduce.

basically the only way for developers to find these issues is via the subsequenttests, which currently are mostly implemented in Java and connect to the OOo instance via a UNO remote bridge.

and the only issues that are half-way easy do debug are deadlocks; in case of missing locks you may get a memory corruption _somewhere_ which causes some later test to crash, but it is very difficult to track down the root cause.

also, most of the developers who work on the applications are not experts in multi-threading issues (those who are tend to work on the lower-level layers like the URE). for example i discovered once that in Writer almost all destructors of UNO objects do not lock a mutex but then call into the Writer core (have partially fixed this for OOo 3.3).

so as a result of all of this driving OOo/LO via remote bridges is rather unreliable.

some have suggested the best way out of this is to find a way so that implementers of UNO APIs do not have to care about thread safety themselves, but instead there should be a framework that does it automatically. such a framework actually exists for many years now (Kay Ramme's "UNO threading framework"), but most of OOo/LO does not make use of it (iirc it is used for only some database drivers).

of course there may also be problems in PyUNO on top of that; back at Sun we had nothing that depended on PyUNO so i guess nobody spent much time debugging it...

The cloudooo project has tested about 100.000 conversions and
implemented some techniques to overcome the issues by monitoring the
libreoffice process for memory leaks and 'endless loops', and retrying
on failure. In the end this brought the failure rate down from about 10%
tot 1.1%.
(http://git.erp5.org/gitweb/cloudooo.git)

yes, there are various ways to minimize the risk of failure, no doubt you are already doing most of these:
- monitor the OOo instance and restart it
- only connect to an OOo instance from a single thread (should result in fewer problems, but e.g. with a JVM you still effectively get multiple connections, don't know about PyUNO)

Both the cloudooo and unoconv presentations will become available and
contain some information on both projects and the PyUNO/LO unreliabilities.

Below is some
example failure output from a single run, LibreOffice does seem a bit
more stable than OpenOffice though.

there are a lot of XSLT errors; LO (at least in 3.4) ships a different XSLT implementation, perhaps that has helped...

regards,
 michael

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to