Thanks David,
I instanced a StandardAnalyzer and passed an empty array for stop
words, and it did the trick.
If anyone wants to comment on what I'm losing by doing this, It would
be really nice.
On 3/13/07, David Balmain <[EMAIL PROTECTED]> wrote:
> On 3/13/07, Julio Cesar Ody <[EMAIL PROTECTED]> wrote:
> > Hey all,
> >
> > I'm getting some really weird results when searching documents. It
> > *seems* to be somehow related to the document format I'm using.
> >
> > I wrote a small script to replicate it:
> >
> > ################
> > #!/usr/bin/ruby
> >
> > require 'rubygems'
> > require 'ferret'
> > include Ferret
> > index = Index::Index.new(:path => '/tmp/fooindex', :key => :id)
> >
> > # dummy data
> > index << {:visibility=>"private", :type=>"media", :title=>"example
> > title", :owner=>"user/3003", :author=>"user/3003",
> > :description=>"description example", :id=>"user/3003/media/1"}
> > index << {:visibility=>"private", :type=>"media", :title=>"a new
> > title", :owner=>"user/3003", :author=>"user/3003", :description=>"more
> > foo desc", :id=>"user/3003/media/2"}
> > index << {:visibility=>"private", :type=>"media", :title=>"random
> > title", :owner=>"user/3003", :author=>"user/3003",
> > :description=>"random description", :id=>"user/3003/media/4"}
> > index << {:visibility=>"private", :type=>"media", :title=>"random
> > title", :owner=>"user/3003", :author=>"user/3003",
> > :description=>"random description", :id=>"user/3003/media/5"}
> >
> > index.search_each(ARGV.shift) { |doc, score|
> > puts index[doc].load.inspect
> > }
> > ################
>
> Thanks for including the script. It makes my job much easier. :)
>
> > The following queries are returning *all* the results currently in the
> > index:
> >
> > $ ruby script.rb "title:me"
> > {:author=>"user/3003", :description=>"description example",
> > :visibility=>"private", :id=>"user/3003/media/1", :title=>"example
> > title", :type=>"media", :owner=>"user/3003"}
> > ... (remaining results)
> > $ ruby script.rb "title:my"
> > (same as above)
> >
> > And weird enough, the following
> >
> > $ ruby script.rb "title:mo"
> >
> > Won't return anything. There's more variants to that, but I think you
> > get my meaning.
>
> The problem is that 'me' and 'my' are stop words. When they get
> removed the query becomes 'title:' which is invalid. By default Ferret
> catches query parse exceptions and attempts to parse the query as a
> simple boolean term query, removing all special characters, so this
> query then becomes 'title'. Since title can be found in the title
> field for all documents, all documents are returned. So I don't think
> this is a bug but it is definitely undesired behaviour. I'll try and
> think of a better way to parse this.
>
> In the mean time, you may want to think about changing the stopword
> list or removing stopwords all together to prevent this problem from
> occurring.
>
> --
> Dave Balmain
> http://www.davebalmain.com/
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk
>
--
Julio C. Ody
http://rootshell.be/~julioody
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk