https://bugs.documentfoundation.org/show_bug.cgi?id=170967
Bug ID: 170967
Summary: Severe performance degradation when opening ODT files
over high-latency SMB/VPN connections due to
non-buffered file access strategy
Product: LibreOffice
Version: 25.2.3.2 release
Hardware: All
OS: Linux (All)
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: filters and storage
Assignee: [email protected]
Reporter: [email protected]
Environment: We use the LibreOffice packages provided in Debian 13
Trixie/Stable amd64.
In our organization, users access documents stored on SMB/Samba file servers
over a VPN connection. While the available bandwidth is sufficient and general
file operations (e.g., cp, md5sum) are fast and responsive, opening even small
ODT documents (e.g., ~250 kB) in LibreOffice is noticeably slow.
This significantly affects user acceptance of LibreOffice compared to a
"certain commercial office program suite", which does not exhibit similar
delays under the same network conditions.
The issue appears specifically when:
- Files are accessed via SMB/CIFS
- The connection involves moderate latency (e.g., VPN, ~10–30 ms RTT)
- Documents are relatively small (well below 1 MB)
Despite small file sizes, document loading time becomes disproportionately
long.
Observed Behavior
- Copying the file locally via cp is fast.
- Running md5sum on the file over SMB is fast.
- Opening the same file in LibreOffice is slow.
- If the file is copied locally first and then opened, loading is immediate.
This strongly suggests that the problem is not bandwidth-related, but
latency-related and tied to the file access pattern used by LibreOffice.
Architectural Analysis (Based on LibreOffice Source Code)
The root cause appears to be the file I/O architecture used when loading ODT
documents.
1. ODT is a ZIP Container
ODT files are ZIP archives containing multiple internal files:
- content.xml
- styles.xml
- meta.xml
- settings.xml
- META-INF/manifest.xml
etc.
Opening an ODT file requires:
- Reading the ZIP central directory
- Seeking to multiple offsets
- Reading individual compressed members
- Decompressing and parsing XML streams
This inherently causes multiple seek and read operations.
2. Small Buffered Reads in SvStream
In:
tools/source/stream/stream.cxx
SvStream uses a fixed internal buffer:
#define STREAM_BUFFER_SIZE 4096
So reads are typically done in 4 kB chunks.
When the buffer is exhausted:
sal_Size SvStream::FillBuffer()
{
sal_Size nRead = Read(pBuf, STREAM_BUFFER_SIZE);
}
This leads to repeated 4 kB read() system calls.
On high-latency storage, each of these calls may incur a round-trip delay.
3. ZipPackage Performs Multiple Seek + Read Operations
In:
package/source/zippackage/
ZIP entries are accessed via XSeekable streams:
m_xSeekable->seek(nOffset);
m_xInputStream->readBytes(...);
Each internal file requires:
- Seek to local header
- Read header
- Read compressed data
- Repeat for next entry
Thus, many small I/O operations are performed.
4. UCB File Layer Does Not Preload Entire File
In:
ucb/source/ucp/file/
The UCB layer exposes XInputStream and XSeekable backed by osl::File::read().
There is no read-ahead or full-file buffering strategy implemented at this
layer.
==> Therefore, LibreOffice performs many small I/O operations directly against
the network filesystem.
Why cp and md5sum Are Fast
Tools like cp and md5sum typically read files in large blocks (e.g., 64 kB – 1
MB).
Thus:
- Few read() calls
- Minimal RTT amplification
- No excessive seek behavior
LibreOffice, by contrast, performs:
- Many 4 kB reads
- Multiple seeks per internal ZIP member
- Incremental XML parsing reads
On a connection with 20 ms RTT:
- 100 small reads × 20 ms = 2 seconds latency
- Even small ODT files may trigger this behavior
Root Cause Summary
LibreOffice processes ODT files directly from the storage backend using many
small, seek-heavy read operations.
When storage latency is elevated (SMB over VPN), each read/seek incurs network
round-trip latency.
The latencies accumulate, causing severe performance degradation.
The architecture assumes low-latency storage (local disk or LAN), which is not
always the case in modern VPN-based environments.
Proposed Improvement
Introduce an optional feature to preload small documents entirely into memory
before processing.
Proposed Setting
Add a configuration option in Preferences:
"Preload documents up to a specified size into memory before processing: [ ]
MB"
With:
Default value: 10 MB
0 MB = disabled
Applied only to seekable file-based backends
Proposed Implementation Strategy
At file open time:
1. Determine file size.
2. If file size ≤ configured threshold:
- Read entire file into memory (e.g., using SvMemoryStream)
- Replace file-backed stream with memory-backed stream
3. Perform ZIP parsing and XML processing in-memory.
LibreOffice already provides:
- SvMemoryStream
- Stream abstraction layers
Therefore, this could likely be implemented without modifying ZIP or XML parser
logic.
Expected Benefits
- Dramatically improved performance on SMB/VPN setups
- Better user experience in remote work environments
- Improved enterprise acceptance
- No regression risk if feature is optional
- Memory overhead limited and configurable
Impact: This issue affects organizations using:
- SMB/CIFS over VPN
- SSHFS
- Cloud-mounted network drives
- High-latency storage backends
It is especially problematic in enterprise environments where central file
servers are accessed remotely.
Conclusion
The current stream-based, seek-heavy access model amplifies storage latency.
An optional whole-file preloading mechanism for small documents would
significantly improve performance in high-latency environments while preserving
existing behavior by default.
We would strongly appreciate consideration of this enhancement!
References
- #142188 "FILEOPEN DOC very slow over shared network over VPN"
- #153202 "FILESAVE: saving .odt to a smb network drive is slow (with excess
download traffic)"
--
You are receiving this mail because:
You are the assignee for the bug.