https://bugs.kde.org/show_bug.cgi?id=358098

--- Comment #4 from John Andersen <jsam...@gmail.com> ---
(In reply to Pinak Ahuja from comment #3)
> This is the intended behavior, for files having text/plain mimetype. This
> was done to avoid the mess caused by applications which have log files in
> directories that are indexed by baloo.
> 
> Though text files with a valid extension like .md .markdown should still be
> indexed because they have the mimetype: text/markdown but right now they are
> also not being indexed because baloo is somehow misinterpreting the
> mimetype. I'm looking into it.

But this is fundamentally the wrong approach, as extensions have never been a
significant part of linux, and are (by your own admission) unreliable indicator
of file content.

This isn't a case of Baloo "misinterpreting" anything.  The link I posted
indicates that mimetype of plaintext is arbitrarily rejected for indexing
unless the extension is "txt" (and size less then 50K).  
When this was put in place (2 years ago) it was indicated as a temporary hack. 
Yet it still exists.  There is no indication that this was the intended
behavior, when the comments in the code clearly label it as some sort of short
term hack.

Someone chose to keep all plaintext out of baloo (a questionable decision at
best,).  Rather than doing this with blacklist/whitelist (exclude filters) to
address problematic file types, all plaintext was summarily rejected unless
extension was txt.

If all plaintext is to be rejected then the rational thing to do is to honor a
whitelist (include filters) to override this rejection.  (I believe that USED
TO EXIST, but was removed in the rush to simplify the control set).

If, on the other hand only SOME plaintext files are problematic, those should
be handled by the exclude filters.

Right now, logs could be handled by exclude filters.
There is no longer a whitelist capability.
But even the exclude filters is totally ignored for plaintext documents.  

So significant functionality has been lost ostensibly just to avoid logs (which
could have been avoided by the exclude filters).  

Look in app.cpp  : 
https://code.woboq.org/qt5/kf5/baloo/src/file/extractor/app.cpp.html
Look for the word HACK.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to