[frameworks-baloo] [Bug 404057] Uses an insane amount of memory (RSS/PSS) and writes a ton of data while indexing

Kai Krakow Sat, 28 Sep 2019 10:24:26 -0700

https://bugs.kde.org/show_bug.cgi?id=404057


--- Comment #9 from Kai Krakow <k...@kaishome.de> ---
Here's more evidence of why LMDB may be a particularly bad choice for the
workload applied by baloo: It is btree organized, and writing and maintaining
btrees will result in a lot of random I/O. At some point in time, when the DB
has become big enough or scrambled enough due to constant updates, this will
backfire badly resulting in very bad I/O patterns.

https://blog.dgraph.io/post/badger-lmdb-boltdb/

Baloo should migrate to a key/value store that is much better at writing data
and maintaining its internals. Read performance of the database should probably
not be the primary concern but performance of long-term writing and updating:
It should maintain good read and write performance. According to the article,
LMDB doesn't (except you give it the full 256GB RAM and lock it into memory).

Researching a little further, we can find a quite different picture:
https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database

It says that LMDB has exceptional write performance and maintains good
performance altogether. Maybe this would need some benchmarks but it probably
holds true only when the DB fully fits into memory all the time. And looking at
the design description in that article we can easily see the downsides: The
database can always only increase in size, even more when writing concurrently
(there's no locking, so any time during concurrent access patterns it will
append to the database). It also never re-organizes its internal structure, it
just reuses memory blocks allocated from a free blocks tree without taking HDD
access patterns into account. And, LMBDs design pays back best only with big
values. I don't think this is what baloo stores.

The article further says that LMDB can (on hypothetical file systems) fail on
Linux when not using fsync(). Was fsync() added to LMDB for such a hypothetical
case? This would be fatal to system performance.

LMDB seems to be baked into a lot of KV databases due to it's seemingly good
performance.

So actually, this would need a lot more insight to decide whether LMDB is
suitable for baloo (maybe it is but it isn't used optimally). Someone with more
real-world experience of KV databases and associated usage patterns may comment
on this.

Currently, limiting the mmap size helps a lot here. And as mentioned by Martin,
there's clearly a bug somewhere resulting in massive write work-loads and
exceptional growth of the database. Maybe it's just a really bad access pattern
by coincidence that results in exceptional bad behavior of LMDB. I was very
happy with baloo performance for a long time until it suddenly broke some day.
I'm not even sure that's baloo's fault: Judging from the commit subjects the
code hasn't undergone any substantial changes since a long time, only small
fixes and tweaks. There's commit b0890aca71aa4f0fdabe65ee7b7fbd0bc844d8b8 after
KF 5.27.0 which bumped maximum index size from 5 GB to 256 GB. @Martin May this
be around the time (end of 2016) when it broke for you? Your "balooctl
indexSize" example seems to suggest there's a big rollover of copy-on-write
operations leaving unused memory blocks behind (maybe to small to be
effectively reused) and thus blowing up the DB file size.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 404057] Uses an insane amount of memory (RSS/PSS) and writes a *ton* of data while indexing

Reply via email to

[frameworks-baloo] [Bug 404057] Uses an insane amount of memory (RSS/PSS) and writes a ton of data while indexing