Re: [fw-general] Zend_Search_Lucene Best N Results

Pete Spicer Thu, 19 Jun 2008 11:19:18 -0700

Hello,

I've been having similar problems with ZSL as well, but whilst I haven'tfound a quick solution, I've found that going back to the index itselfand understanding what is going on will prove useful - depending on whatkinds of documents you're indexing and what kinds of searches you'rerunning, you may find that is part of the problem.

Without knowing the fields in your index, including the types of thefields, and the types of words in your documents (or even what kinds ofdocuments you have), it's very hard to give any specific advice.

The memory usage seems to come from the size of the word list itself,and bear in mind that the word list will easily run into the thousandsof unique words. The way the default analyser handles it, things like"I'm" and "There's" will be split at the apostrophe, so the index willbe holding the word 's' as a word. Things like this will easily expandthe size of the index, and massively increase the memory overhead.Additionally if you are able to keep the input query small, with as fewbut more unique search terms as possible, this will keep memory usage low.

To really reduce memory, depending on whether you have the resources andtime to do so, it may be worth investigating writing a custom analyser,geared towards the kinds of words you have in your database. In mine,for example, I have multiple variant spellings of words on input - myindex holds written works, where the original author uses unusualconstructions (e.g. making s into sh to demonstrate that the characteris drunk), so by doing some analysis on that, I've been able to trim theindex down and keep its memory usage down.

Another way is to avoid using Keyword fields where possible, andswitching them to some tokenised form, assuming the data can be suitablyindexed and isn't needed to be kept as Keyword.

Stripping out really common words might also help, but that's onlyreally best if you're not using the indexed text to be able to displaythe results.

If you are able to provide a few more details about your index, I mightbe able to give you a few better pointers.


Regards
Pete



Alex wrote:

Hi,
I'm having serious memory problems with ZSL. My current index holdsaround 400 thousand documents.
If I run a search for a term with about 500 results, ZSL returns thebest results first but uses a tremendous amount of memory.
If I use $index->setResultSetLimit(), I decrease memory usagesignificantly, but I get some very poor first results.
Thanks for your help!
- Alex

Re: [fw-general] Zend_Search_Lucene Best N Results

Reply via email to