https://bugs.kde.org/show_bug.cgi?id=444520

tagwer...@innerjoin.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tagwer...@innerjoin.org

--- Comment #4 from tagwer...@innerjoin.org ---
(In reply to Adam Fontenot from comment #0)
> ... Just in time for Halloween... try to pause Baloo using system settings, 
> but this
> (frequently? always?) doesn't work, as I describe in this bug: Bug 443693
Also noticed this, baloo_file seems not to respond to events while waiting for
baloo_file_extractor to complete. Complicated by the fact that
baloo_file_extractor indexes batches of files (40 files, and then the next 40,
and then...)

> Sooner or later, whenever Baloo kicks back in, it may also restart indexing
> file content, despite being disabled.
Baloo should really not restart on its own. If disabled it should stay disabled
- although if the baloo_file process died or was killed it would be restarted
(at least) at the next logon. 

> To state the obvious, Baloo should *never* resurrect an already-killed file
> extraction if "index file content" is disabled.
Agreed. But that's tricky.

The indexing has to recognise whether it's interrupted, say, from a log out or
closedown (in which case it should quietly continue from where it was when
you've logged on again) or you have disabled indexing / forceably killed the
process (in which case, baloo's not going to know why).

It should be that you can get a list of "failed" indexings with "balooctl
failed" but I've not had a lot of luck with that - and there should probably be
a manual way of flagging a file as "avoid/failed"

> OBSERVED RESULT
> Baloo appears to resume the partially completed indexing process that the
> user previously killed, including indexing files - in particular the file or
> files that were causing problems for the indexer.
Would need someone who knows the code here: whether baloo_file flags the files
as "to be indexed" before it passed them to baloo_file_extractor. If that's the
case it could be that baloo wants to complete "that" job...

> ... On the computer this happened on, I
> caught baloo_file_extractor hanging (again) with 100% CPU use and several GB
> of memory eaten on one particular PDF file, the same issue that triggered my
> comment here: https://bugs.kde.org/show_bug.cgi?id=380456#c14

(In reply to Adam Fontenot from comment #3)
> Note: my best guess for the cause of this issue is that if Baloo has a file
> content indexing operation in progress, this operation is terminated, and
> then file content indexing is disabled, in at least certain cases when
> normal file indexing is resumed thereafter, it will also resume trying to
> index the content of the file / files that were in progress previously. If
> those files were causing the file indexer to hang, Baloo will also hang once
> again. For that reason, disabling file content indexing *and* deleting
> Baloo's cache (which had grown to an enormous size on this small SSD)
> prevented the problem from appearing again, because any leftovers from the
> in-progress indexing operation were deleted.
I think that's true.

I'd also suspect the "very large" PDF being the reason for the large index
(baloo will write a reverse index entry for each of the "random words"),
however there are other things that can also trigger the index to balloon in
size.

It's not always clear whether the delay in content indexing comes from the
extraction of the index terms from the original files or that there's a large
transaction being prepared (and once the index file has "got big" this can be
very memory intensive). I've found iotop useful to follow the reads/writes.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to