https://bugs.kde.org/show_bug.cgi?id=420939

--- Comment #47 from [email protected] ---
(In reply to Scott from comment #46)

No problem, we carry on troubleshooting.

> I think the problem is more than just misidentifying mime types.
Finding out about the mimetypes and that baloo would never attempt to index
some files was one step along the way. Good to find out but there's more to do.

> 3/ Further it reports files waiting to be indexed and files failed to index
> both being zero when in fact approximately 1,000 of the 6,000 files in the
> dataset have not been indexed. I have restarted baloo repeatedly and they
> never get indexed, it re-indexes what it had before.
It's possible that we've got another mimetype issue with these files, or they
are your 1000 biggest files, or something else. I think copy one of them to
your home directory and check with

    xdg-mime query filetype ...newstrangefile...

Check that the mimetype is sensible, then see what

    balooshow -x ...newstrangefile...

says.

> 1/ baloo terminates during indexing for unknown reasons (not
> hanging/freezing as I erroneously stated previously) without providing a
> reason code.
I'll ask a bit more about this. Your "balooctl status" output says 

> Baloo File Indexer is running
> Indexer state: Idle
That's what baloo says when it's alive and thinks it has nothing more to do.
There is the content indexer process "baloo_file_extractor" that is run when
there is indexing necessary, does its job, saves the results, stops and is run
again when there is more to do. This would/should happen in the background and
you wouldn't see exit codes.

> 2/ On restarting the indexing baloo re-indexes the same files with an
> erroneous message that the files have changed (see my last email) or added
> with baloo being turned off. Baloo is not checking that these index entries
> already exist or there is some problem with the index file itself and so
> just duplicates them which is why baloo reports over 21,000 files indexed
> from a dataset only containing 6,000 entries.
The error message is a:

> ... id seems to have changed. Perhaps baloo was not running, and this file 
> was deleted + re-created
Need to check the Id and see if it is really changing. Ask with "stat", you'll
get something like:

    $ stat 1.ts
      File: 1.ts
      Size: 41416704        Blocks: 80896      IO Block: 4096   regular file
    Device: fc01h/64513d    Inode: 794964      Links: 1
    Access: (0664/-rw-rw-r--)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
    Access: 2021-07-24 22:50:57.838161084 +0200
    Modify: 2021-07-24 22:50:57.838161084 +0200
    Change: 2021-07-24 22:51:42.686181710 +0200
    Birth: -

It's the "Device" and "Inode" numbers that you need to keep you eye on. The:

    Device: fc01h/64513d    Inode: 794964

If you reboot and these change, baloo will think it's got a new file and try to
index it again. Keep a note of the numbers, check again after a reboot and
compare.

You could also try a baloosearch for one of the files that always seems to be
reindexed

    $ baloosearch -i ...oneofyoursavedfiles...

If you are OK, baloosearch will give a single result, if the id has been
changing, "baloosearch -i" would show several lines - with different ID numbers
and the same file/pathname. Something like:

    $ baloosearch -i testfile
    9ca00000028 /home/test/testfile
    9ca0000002a /home/test/testfile
    9ca0000002c /home/test/testfile

That would be a red flag...

> I had to disable baloo because it somehow seriously interferes with my
> ability to move files from the admin PC to the server. With baloo running
> on the server any attempt to transfer files to it results in very slow
> transfer speeds and on occasion failure to complete the move and this is
> occuring while the indexer is reporting idle.
I can only guess where - but you are indexing *really* large files, and there
were a couple of fixes two months ago to stop a Mime lookup read the whole file
into memory. Bug 398908, fixed according to
   https://bugs.kde.org/show_bug.cgi?id=398908#c97
with 5.83. If you don't have this version, maybe the best thing to do it wait
until it gets to you with an update.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to