Hello,
I've been having similar problems with ZSL as well, but whilst I haven't
found a quick solution, I've found that going back to the index itself
and understanding what is going on will prove useful - depending on what
kinds of documents you're indexing and what kinds of searches you're
running, you may find that is part of the problem.
Without knowing the fields in your index, including the types of the
fields, and the types of words in your documents (or even what kinds of
documents you have), it's very hard to give any specific advice.
The memory usage seems to come from the size of the word list itself,
and bear in mind that the word list will easily run into the thousands
of unique words. The way the default analyser handles it, things like
"I'm" and "There's" will be split at the apostrophe, so the index will
be holding the word 's' as a word. Things like this will easily expand
the size of the index, and massively increase the memory overhead.
Additionally if you are able to keep the input query small, with as few
but more unique search terms as possible, this will keep memory usage low.
To really reduce memory, depending on whether you have the resources and
time to do so, it may be worth investigating writing a custom analyser,
geared towards the kinds of words you have in your database. In mine,
for example, I have multiple variant spellings of words on input - my
index holds written works, where the original author uses unusual
constructions (e.g. making s into sh to demonstrate that the character
is drunk), so by doing some analysis on that, I've been able to trim the
index down and keep its memory usage down.
Another way is to avoid using Keyword fields where possible, and
switching them to some tokenised form, assuming the data can be suitably
indexed and isn't needed to be kept as Keyword.
Stripping out really common words might also help, but that's only
really best if you're not using the indexed text to be able to display
the results.
If you are able to provide a few more details about your index, I might
be able to give you a few better pointers.
Regards
Pete
Alex wrote:
Hi,
I'm having serious memory problems with ZSL. My current index holds
around 400 thousand documents.
If I run a search for a term with about 500 results, ZSL returns the
best results first but uses a tremendous amount of memory.
If I use $index->setResultSetLimit(), I decrease memory usage
significantly, but I get some very poor first results.
Thanks for your help!
- Alex