https://bugs.kde.org/show_bug.cgi?id=404057
--- Comment #9 from Kai Krakow <k...@kaishome.de> --- Here's more evidence of why LMDB may be a particularly bad choice for the workload applied by baloo: It is btree organized, and writing and maintaining btrees will result in a lot of random I/O. At some point in time, when the DB has become big enough or scrambled enough due to constant updates, this will backfire badly resulting in very bad I/O patterns. https://blog.dgraph.io/post/badger-lmdb-boltdb/ Baloo should migrate to a key/value store that is much better at writing data and maintaining its internals. Read performance of the database should probably not be the primary concern but performance of long-term writing and updating: It should maintain good read and write performance. According to the article, LMDB doesn't (except you give it the full 256GB RAM and lock it into memory). Researching a little further, we can find a quite different picture: https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database It says that LMDB has exceptional write performance and maintains good performance altogether. Maybe this would need some benchmarks but it probably holds true only when the DB fully fits into memory all the time. And looking at the design description in that article we can easily see the downsides: The database can always only increase in size, even more when writing concurrently (there's no locking, so any time during concurrent access patterns it will append to the database). It also never re-organizes its internal structure, it just reuses memory blocks allocated from a free blocks tree without taking HDD access patterns into account. And, LMBDs design pays back best only with big values. I don't think this is what baloo stores. The article further says that LMDB can (on hypothetical file systems) fail on Linux when not using fsync(). Was fsync() added to LMDB for such a hypothetical case? This would be fatal to system performance. LMDB seems to be baked into a lot of KV databases due to it's seemingly good performance. So actually, this would need a lot more insight to decide whether LMDB is suitable for baloo (maybe it is but it isn't used optimally). Someone with more real-world experience of KV databases and associated usage patterns may comment on this. Currently, limiting the mmap size helps a lot here. And as mentioned by Martin, there's clearly a bug somewhere resulting in massive write work-loads and exceptional growth of the database. Maybe it's just a really bad access pattern by coincidence that results in exceptional bad behavior of LMDB. I was very happy with baloo performance for a long time until it suddenly broke some day. I'm not even sure that's baloo's fault: Judging from the commit subjects the code hasn't undergone any substantial changes since a long time, only small fixes and tweaks. There's commit b0890aca71aa4f0fdabe65ee7b7fbd0bc844d8b8 after KF 5.27.0 which bumped maximum index size from 5 GB to 256 GB. @Martin May this be around the time (end of 2016) when it broke for you? Your "balooctl indexSize" example seems to suggest there's a big rollover of copy-on-write operations leaving unused memory blocks behind (maybe to small to be effectively reused) and thus blowing up the DB file size. -- You are receiving this mail because: You are watching all bug changes.