https://bugs.documentfoundation.org/show_bug.cgi?id=151207
Bug ID: 151207
Summary: [SAMPLE] Areas where multithreading would be needed to
improve slow performance of common tasks in huge
(million rows) spreadsheets
Product: LibreOffice
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: enhancement
Priority: medium
Component: Calc
Assignee: [email protected]
Reporter: [email protected]
I've seen a handful of bug reports about the need for multithreading, but they
typically always end up closed because they are considered "not actionable" /
too vague, so it is my hope that by providing a "torture test" sample file
here, and my measurements of the slowness of various common tasks, you will be
able to have a reference points of various things that would highly benefit
from being multithreaded, as they apparently are not right now. I'm filing this
as a single bug report because it would feel a bit ridiculous & overwhelming to
open two dozen reports for each and every point where the issue is manifest,
and I hope this summary here will prove sufficient.
To reproduce the issues on your end, download this "torture test" sample file
of mine, as it will be a very useful benchmark to discover areas where
LibreOffice would benefit most from multithreading in commonly used tasks:
https://fortintam.com/public/libreoffice-augustin-benchmark--million-rows-spreadsheet.ods
Here are some "obvious" areas where I've identified slow, single-core work
happening (tested with my 8-cores Intel Xeon W3520 CPU):
* Opening the file takes 3 minutes and 35 seconds, using a single core from my
CPU. If it used all cores, we could presume it would take only about 27
seconds. Possibly related: bug #128396 but that one was closed as being a
duplicate of bug #65046 which was in turn closed as being fixed, but it's not
fixed for big spreadsheets like what I'm seeing today.
* After selecting columns A to M and choosing "Data > Standard Filter", after
choosing column D ("Page Title") in the "Field name" combobox, you need to wait
7 seconds before the UI unlocks and you can click the "Condition" combobox (to
set the "Contains" condition, for example). Filtering (if you type "example" in
the "Value" field and press OK) in itself is very fast, however. Just that the
column selection GUI causes something slow to happen to populate the Condition
combobox.
* Selecting columns A to M and doing a standard sort operation on column E
("referring domains") takes 15 seconds and uses only one CPU core. If it used
my 8 cores, it would probably take less than 2 seconds.
* Search & replace (across the whole sheet, or "Current selection only" after
selecting columns A to M), to replace "Example" by "Banana": it takes 1 minute
and 8 seconds on my machine because it currently only uses 1 of the cores. If
it used the 8 cores, it could accomplish this in roughly 8.5 seconds.
* Saving the file (as a new file) takes 47 seconds and seems very CPU-bound
(rather than I/O bound?) as I once again see only one of the CPU cores used at
100%; we can presume that making this multi-threaded would allow saving this
big file in 6 to 12 seconds (I'm being conservative on my estimate here).
* Filtering with auto filters is also a very slow process here, as the GUI for
controlling it is interactive and thus much slower to react than the "fire
once" Standard Filtering GUI. Particularly, if you try to type a string in the
filtering entry, it will immediately try to search and filter through all the
possible valid values, which is extremely expensive. I suspect it is trying to
do that immediately upon typing each and every character, which is a bad thing
to do from a performance standpoint, you will get massive performance gains by
using a timeout-based search trigger like I am suggesting in bug #151206
* When selecting columns A to M and clicking the "Pivot Table" button, after
selecting "Current sheet" (or something like that, it takes 40 seconds for the
main "Pivot Table Layout" dialog/wizard to appear, again single-threaded, could
benefit significantly from some optimizations there. Afterwards however,
generating the pivot table from the dragged values is very fast (roughly 5
seconds), so congrats on that!
* Creating a bar chart out of columns A and E (with A as a label column), after
the wizard's questions and dragging into the sheet to attempt to insert/draw
the chart, the app once again uses only a single core and takes approximately
3.5 minutes to show the chart; possibly, if it was multithreaded, it could take
as little as 30 seconds on my computer.
Under all of these conditions, you can observe that LibreOffice Calc 7.4 uses
only one of your CPU cores, at 100%. These problems could be vastly reduced if
it were to split the problem space across cores/threads, which would mean that
on most computer it would be at least 4 to 16 times faster (since most CPUs
have 4 to 16 cores/threads nowadays), which would make LibreOffice Calc very
compelling compared to the competition on that front.
--
You are receiving this mail because:
You are the assignee for the bug.