Beagle testing questions

David Rowntree Wed, 25 Apr 2007 01:31:19 -0700

All,
  I have been testing beagle for many months now (both releases and svn
versions) and I'm still having problems.  It would seem that if I run a
re-index overnight (on 60Gb of compressed pdf/ms docs) the indexer will hit
the vmrss limit five or six times, get killed and restarted.


Eventually it will all settle down, and the indexer will finish.  If I then
take a random pdf (say a technical journal article) and run
beagle-extract-content on it, the look for general technical phrases such as
"low-voltage" or "phase error" I can then run a search (via kate) and get
more hits than I can read in a week.
If I repeat this, but search for more obscure (but english) phrases like
"minimum illumination" or something, I know there is a file with that
phrase, and I can see the plain text - but the search returns no hits!

Should beagle be able to find /every/ sensible english word?  Is it possible
I have a partially complete index?  How do I determine what files are
excluded from the index?

I have seen from the logs that when processing archives (and all my media
files are .gz compressed) that the actual indexing of the child (i.e. the
.pdf file contained in the archive) is 'deferred' until later.  What happens
if the index helper get killed?  Does this deferral get ignored and the
archive reexamined later?

Is there a way to get a report of all the files that haven't been indexed,
either because of missing filters (postscript docs don't work yet) or
exceptions?

How can I gain confidence in the validity and completeness of the index?

Thanks in advance!

Regards,
Dave.

_______________________________________________
Dashboard-hackers mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/dashboard-hackers

Beagle testing questions

Reply via email to