https://bugs.kde.org/show_bug.cgi?id=404057

Kai Krakow <k...@kaishome.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |k...@kaishome.de

--- Comment #3 from Kai Krakow <k...@kaishome.de> ---
@Martin Thanks for pointing me here.

I can confirm the observations:

RSS can grow easily above 3-4 GB.

baloo_file_extractor generates a lot IO with high throughput (sometimes 100
MB/s), mostly while scraping PDF files (i.e. my Calibre library), up to the
point that the whole desktop becomes unresponsive and laggy. It's mostly read
accesses with writes coming by in bursts once in a while. Especially btrfs has
it's problems with these access patterns. The DB is already created nocow.

The index file seems to be growing and growing. Last time I purged it when it
reached 19 GB. This is about the point when the system becomes unusable due to
IO stalls.

"balooctl" cannot really do anything: Run "balooctl stop" and it wouldn't stop
(or restart instantly). Run "balooctl disable" and it will be back on next
reboot. Run "balooctl start" and it says that another instance is already
running even when there isn't. I'm not sure if baloo is currently even able to
monitor and know its own status.

VSS of at least two baloo processes is 256GB. While I know that this is only
allocated not used, it still seems to have an effect on kernel memory
allocation performance. The system feels snappier when I "killall baloo" even
when baloo was idle and only used minor amounts of memory. It should probably
just not do that. I'm not sure if this is by using mmap. But if it is, it may
explain a lot of the overwhelming IO patterns.

Eventually baloo finishes if letting it run long enough. But the whole process
repeats from scratch when rebooting the machine. The counter for indexed files
is growing by a huge amount after each reboot - as if it doesn't properly
detect duplicates nor cleanup old stuff. It looks like it detects all files as
new/modified (which is not true) and adds them to the index again.

CPU usage was moderate and nothing I care about too much because it runs at low
CPU priority.

System specs:

Linux 4.20.6-gentoo with CK patchset, i7-3770K, 16 GB RAM
BFQ-MQ IO scheduler
4-disk RAID-1 btrfs running through bcache on a 400G SSD caching partition
systemd with dbus-user-session

Baloo database directory is made nocow (otherwise I get very rhythmic IO noise
from the harddisks as it seems to rewrite data over and over again, resulting
in a lot of fragmentation and cow relocations)

Wishlist entry:
It should be possible to easily move baloo into a cgroup (maybe it could create
one itself, or we could configure it to optionally join a systemd slice) so I
could limit its memory usage. Modern kernels will limit cache usage that way,
too. Currently when running baloo, it will dominate the disk cache for it's own
purpose. OTOH, maybe it's just missing proper cache hinting via fadvise().

Limiting memory usage via cgroups is already pretty effective for browsers, see
here:
https://github.com/kakra/gentoo-cgw

I already considered doing something similar for baloo but I think it's
preferable if it would manage its own resource usage better by itself.

Baloo could also monitor loadavg and reduce its impact on system performance
automatically. Here's an example which has been very successful:
https://github.com/Zygo/bees/commit/e66086516fdb9f9cc2d703fb8101f6116ce169e9

It inverts the loadavg function to calculate the current point-in-time load and
adjusts its resource impact based on this, targeting a user-defined loadavg.
This commit did magic to system responsiveness while the daemon is running and
working.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to