Hi, > I have been testing beagle for many months now (both releases and svn > versions) and I'm still having problems. It would seem that if I run a > re-index overnight (on 60Gb of compressed pdf/ms docs) the indexer will hit > the vmrss limit five or six times, get killed and restarted.
Yeah, those are text heavy files. The memory situation is better every release but not quite there yet. The svn checkout would be slightly better, but as I see it needs more work. I'll investigate, thanks. > If I repeat this, but search for more obscure (but english) phrases like > "minimum illumination" or something, I know there is a file with that > phrase, and I can see the plain text - but the search returns no hits! Hmm... that would be a bug. I never tested phrase query that extensively. I will check this. Unlikely, but it is possible that beagle-search could be skipping some hits. Can you please check with beagle-query ? Also, if you search for words (e.g. illumination) then do you get results ? > Should beagle be able to find /every/ sensible english word? Is it possible I would like to claim, yes it should. Beagle svn contains a simple tool to list all terms in beagle-index, beagle-dump-index. Something like "beagle-dump-index --terms --indexdir=/path/to/index-directory" would list all the terms in the index on which you can grep for the word to see if some word is in the index. It not quite that simple, you have to grep for the stemmed word and it would take a long time for a large index but its a useful debugging tool. > I have a partially complete index? How do I determine what files are > excluded from the index? I don't think you have a partially complete index. Check index-info --status, if the scheduler queue is empty then definitely indexing is finished. You are using the files backend - right ? If you are testing using static indexes, then index-info does not apply. There is no direct way to determine what files are not indexed from the beagle index. > I have seen from the logs that when processing archives (and all my media > files are .gz compressed) that the actual indexing of the child ( i.e. the > .pdf file contained in the archive) is 'deferred' until later. What happens > if the index helper get killed? Does this deferral get ignored and the > archive reexamined later? Yup. Indexing of archives is not complete till all the included (and sub-included and so on) files are indexed. If killed midway, they would be re-indexed. > Is there a way to get a report of all the files that haven't been indexed, > either because of missing filters (postscript docs don't work yet) or > exceptions? Unfortunately nothing better than grepping the logfiles :( > How can I gain confidence in the validity and completeness of the index? *sigh* Till I read your email, I assumed beagle at least does not have _this_ problem. I willdefinitely look into this. - dBera -- ----------------------------------------------------- Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user _______________________________________________ Dashboard-hackers mailing list [email protected] http://mail.gnome.org/mailman/listinfo/dashboard-hackers
