Idonotexist <obila...@yahoo.com> changed:
What |Removed |Added
--- Comment #4 from Idonotexist <obila...@yahoo.com> ---
(In reply to Vishesh Handa from comment #2)
> Perhaps the correct approach would be to refactor `baloo_file_extractor` so
> as to not perform a commit so frequently. We currently do it after a fixed
> 40 files. Perhaps it would make sense to try and estimate the amount of
> changes, and then do a commit when we reach the threshold.
> I'm not sure if I should keep this bug open or what. Specially since this is
> probably only a problem during first run.
As I write this, Baloo is hammering my very modern system's HDD to a pulp. The
disk activity LED is furiously lit. KDE's UI periodically freezes because of
heavy disk I/O.
My typical solution is to
1) Pause indexing
2) Mount a 10GB ramdisk
3) Move ~/.local/share/baloo to said ramdisk
4) Symlink ~/.local/share/baloo to the ramdisk baloo
5) Resume indexing
6) When indexing is done, undo the above.
I definitely do not think this bug should be closed. It is most certainly not
caused only on first runs. The current Baloo hyperactivity was caused by my
copying of a large number of small files from another system.
Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your
work. It uses a quality database backend in the form of LMDB. But any which way
you and I might spin it, Baloo has a serious problem with I/O: it simply causes
too much of it, too frequently. Numerous users have complained about this, and
several currently open and closed bugs are traceable directly to this
behaviour. Several users' impressions of Baloo, and KDE writ large, are tainted
by Baloo's abusive disk activity.
As for how to fix this problem: 40 files per transaction commit, as you said,
is not a good enough solution. At the very least, the criterion should be based
on LMDB's page size and the disk block size. I also propose that this criterion
not be based purely on number of files; It should have a time component, and
should not commit transactions more often than once per second. A human user
couldn't care less that newly-appeared files were indexed this second or next,
and a file indexer is after all primarily, though not exclusively, for human
Here's a relatively simple proposal: The indexer operates on a configurable
*duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds
per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively*
performs data/metadata reads from the filesystem, keeping an eye on wall-clock
time. Once D*T seconds of work have elapsed, make a *single transaction*
containing all of the stuff that the indexer read in the previous duty cycle.
Then go back to sleep again. In this way, exactly one mdb_txn_commit() and
fdatasync()/msync() occurs per time period, they are likely to have accumulated
far more than 40 files worth of information, and 50-99% of I/O bandwidth is
available for other uses, such as satisfying the desktop UI's needs.
You are receiving this mail because:
You are watching all bug changes.