[Bug 160623] Issue while converting DOCX or PPTX files to PDF files

bugzilla-daemon Thu, 11 Apr 2024 02:42:51 -0700

https://bugs.documentfoundation.org/show_bug.cgi?id=160623


--- Comment #2 from [email protected] ---
For me, LibreOffice could be 10x times faster :) 

I spent a few days last year researching this subject. Here are my conclusions:

Most of the time, especially with long documents containing very large tables,
the main bottleneck is here:

- File: dev/core/svl/source/items/stylepool.cxx
- Function: Node* Node::findChildNode

>From my understanding, this function is called each time a new "style" is
parsed from a part of the document (a paragraph, a table cell, etc.). 
The goal of this function is to find an existing style among already parsed
styles. If not found, it creates a new style node. 
Problem, this search operation is extremely costly because:

- Comparing styles is slow, often requiring the comparison of complex
heterogeneous structures.
- It performs an O(n) search using a simple for-loop. This loop is executed for
each parsed style in the document, making it effectively O(n^2). The longer the
document, the greater the search cost.

I don't know the LibreOffice code well enough to optimize this myself.
Ideally, I would like to create a short hash of each style node, with a hash
index to find an existing style.
I am also wondering about the impact of creating a new Node style instead of
reusing an existing one to avoid the search.

We are willing to pay a few thousand euros to a company that can solve this
problem. 
Please contact me privately for this.

I would also like to help solve this problem 🙏

Thank you.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 160623] Issue while converting DOCX or PPTX files to PDF files

Reply via email to