Re: [fw-general] Zend_Search_Lucene Best N Results

Alex Sun, 22 Jun 2008 01:56:17 -0700

Hi Peter,

Well, I opted instead for trying Sphinx full-text search.


What took an hour to index with Zend_Search_Lucene takes around 5 seconds
with Sphinx.

Search is ridiculously fast with no optimization whatsoever.

I'm getting document IDs from Sphinx (which gives really really good
results) and running a MySQL IN() query to get fresh records. Very very
nice.

If you need any help setting this up let me know!

You can test it at http://www.todascifras.com.br


- Alex

On Thu, Jun 19, 2008 at 3:19 PM, Pete Spicer <[EMAIL PROTECTED]> wrote:

> Hello,
> I've been having similar problems with ZSL as well, but whilst I haven't
> found a quick solution, I've found that going back to the index itself and
> understanding what is going on will prove useful - depending on what kinds
> of documents you're indexing and what kinds of searches you're running, you
> may find that is part of the problem.
>
> Without knowing the fields in your index, including the types of the
> fields, and the types of words in your documents (or even what kinds of
> documents you have), it's very hard to give any specific advice.
>
> The memory usage seems to come from the size of the word list itself, and
> bear in mind that the word list will easily run into the thousands of unique
> words. The way the default analyser handles it, things like "I'm" and
> "There's" will be split at the apostrophe, so the index will be holding the
> word 's' as a word. Things like this will easily expand the size of the
> index, and massively increase the memory overhead. Additionally if you are
> able to keep the input query small, with as few but more unique search terms
> as possible, this will keep memory usage low.
>
> To really reduce memory, depending on whether you have the resources and
> time to do so, it may be worth investigating writing a custom analyser,
> geared towards the kinds of words you have in your database. In mine, for
> example, I have multiple variant spellings of words on input - my index
> holds written works, where the original author uses unusual constructions
> (e.g. making s into sh to demonstrate that the character is drunk), so by
> doing some analysis on that, I've been able to trim the index down and keep
> its memory usage down.
>
> Another way is to avoid using Keyword fields where possible, and switching
> them to some tokenised form, assuming the data can be suitably indexed and
> isn't needed to be kept as Keyword.
>
> Stripping out really common words might also help, but that's only really
> best if you're not using the indexed text to be able to display the results.
>
> If you are able to provide a few more details about your index, I might be
> able to give you a few better pointers.
>
> Regards
> Pete
>
>
>
>
> Alex wrote:
>
>> Hi,
>>
>> I'm having serious memory problems with ZSL. My current index holds around
>> 400 thousand documents.
>>
>> If I run a search for a term with about 500 results, ZSL returns the best
>> results first but uses a tremendous amount of memory.
>>
>> If I use $index->setResultSetLimit(), I decrease memory usage
>> significantly, but I get some very poor first results.
>>
>> Thanks for your help!
>>
>>
>> - Alex
>>
>
>

Re: [fw-general] Zend_Search_Lucene Best N Results

Reply via email to