On Mar 31, 2007, at 5:36 PM, Jeff Mallatt wrote:

> I'm getting some results that I don't understand from a search.
>
> index << {:uid => 'one', :title => 'Some Title', :content => 'my  
> first text'}
> index << {:uid => 'two', :title => 'Some Title', :content => 'some  
> second content'}
> index << {:uid => 'three', :title => 'Other Title', :content => 'my  
> third text'}
>
> query(index, 'title:"Some"')
> query(index, 'title:"Title"')
> query(index, 'uid:"two"')

Nice one.

When people don't understand search results, it's usually to do with  
stop words. The StandardAnalyzer which parses documents and(!)  
queries, uses a list of stop words which are ignored. See  
Ferret::Analysis::FULL_ENGLISH_STOP_WORDS for a complete list of  
(english) stop words.

In the case of "title:Some", "Some" is removed by the analyzer giving  
only "title:", i.e. an empty query which (surprisingly) matches all  
documents.

However, the same should happen with "content:some" but this one  
returns only one document which leaves me completely puzzled. This  
just isn't consistent.

So I'm afraid I can't be of much help here, but I'm sure somebody  
else will enlighten us. This might as well be a bug, but even if it's  
not, it's definitely not what anyone would reasonably expect.

--

@David: You should probably consider changing StandardAnalyzer not to  
use stop words by default. It confuses people because no one would  
suspect such a feature to be enabled by default. It just doesn't  
follow the principle of least astonishment.

Even if people want to use stop words, they might not be happy with  
the ones built into Ferret. It very much depends on the nature of the  
content that is indexed and instead of using a one-size-fit-all stop  
word list one is usually better off with compiling a custom one for  
any particular application.

Cheers,
Andy
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to