Hi Peter, Well, I opted instead for trying Sphinx full-text search.
What took an hour to index with Zend_Search_Lucene takes around 5 seconds with Sphinx. Search is ridiculously fast with no optimization whatsoever. I'm getting document IDs from Sphinx (which gives really really good results) and running a MySQL IN() query to get fresh records. Very very nice. If you need any help setting this up let me know! You can test it at http://www.todascifras.com.br - Alex On Thu, Jun 19, 2008 at 3:19 PM, Pete Spicer <[EMAIL PROTECTED]> wrote: > Hello, > I've been having similar problems with ZSL as well, but whilst I haven't > found a quick solution, I've found that going back to the index itself and > understanding what is going on will prove useful - depending on what kinds > of documents you're indexing and what kinds of searches you're running, you > may find that is part of the problem. > > Without knowing the fields in your index, including the types of the > fields, and the types of words in your documents (or even what kinds of > documents you have), it's very hard to give any specific advice. > > The memory usage seems to come from the size of the word list itself, and > bear in mind that the word list will easily run into the thousands of unique > words. The way the default analyser handles it, things like "I'm" and > "There's" will be split at the apostrophe, so the index will be holding the > word 's' as a word. Things like this will easily expand the size of the > index, and massively increase the memory overhead. Additionally if you are > able to keep the input query small, with as few but more unique search terms > as possible, this will keep memory usage low. > > To really reduce memory, depending on whether you have the resources and > time to do so, it may be worth investigating writing a custom analyser, > geared towards the kinds of words you have in your database. In mine, for > example, I have multiple variant spellings of words on input - my index > holds written works, where the original author uses unusual constructions > (e.g. making s into sh to demonstrate that the character is drunk), so by > doing some analysis on that, I've been able to trim the index down and keep > its memory usage down. > > Another way is to avoid using Keyword fields where possible, and switching > them to some tokenised form, assuming the data can be suitably indexed and > isn't needed to be kept as Keyword. > > Stripping out really common words might also help, but that's only really > best if you're not using the indexed text to be able to display the results. > > If you are able to provide a few more details about your index, I might be > able to give you a few better pointers. > > Regards > Pete > > > > > Alex wrote: > >> Hi, >> >> I'm having serious memory problems with ZSL. My current index holds around >> 400 thousand documents. >> >> If I run a search for a term with about 500 results, ZSL returns the best >> results first but uses a tremendous amount of memory. >> >> If I use $index->setResultSetLimit(), I decrease memory usage >> significantly, but I get some very poor first results. >> >> Thanks for your help! >> >> >> - Alex >> > >
