On 10/21/06, ahFeel <[EMAIL PROTECTED]> wrote: > Hi :) > > Here's a little code reproducing something that i consider as a bug, if > it's not please explain :] > > http://pastie.caboo.se/18693
Hi Jérémie, You can get rid of this behaviour by building your own analyzer and not including the HyphenFilter. This is a tricky issue which I haven't quite worked out yet. For example, when you search for "set-up" do you want that to match "set up" and "setup". What if you search for "setup" or "set up"? Should they match all three versions too? With the current HyphenFilter these all three versions in queries will match all three versions in the index. However, this comes at the loss of recall. The problems occur during phrase queries. To make it so that "set-up" matches both "setup" and "set up", "set-up" is analyzed as "set up and "setup" so in the first position there are two words in the tokenstream; "set" and "setup". When I parse the phrase "set-up files" I get the two phrases: "set____up__files" "setup______files" So as you can see the second phrase only has two terms. so there is a gap in betwen. To get the phrase "setup files" to match this I need to give it a slop value. Now I realize the solution is not ideal. I've had to forsake some precision for a gain in recall but I can't think of a better way. If you can come up with a fool-proof way to handle hyphenated terms I'd love to hear it. I will probably remove the HyphenFilter from the StandardFilter in a futer version if I can't think of a better way to do this. By the way, for the people reading this who think that "setup" is not a word, I agree so consider "e-mail" and "email" instead. Cheers, Dave PS: I've pasted the code below for reference. I'm not sure how long the pasties stick around for. require 'rubygems' require 'ferret' path = "/tmp/index" system("rm -rf #{path}; mkdir -p #{path}") index = Ferret::Index::Index.new(:path => path) index << {:type => :bug, :name => 'foo-bar'} index << {:type => :bug, :name => 'foo-bar-core'} queries = ['foo-bar', 'foo-core'] queries.each do |name| query = "type:bug AND name:#{name}" puts "\nquery : #{query}" res = index.search(query) puts "total hits = #{res.total_hits}" res.hits.each { |x| p index[x.doc].load.inspect } end _______________________________________________ Ferret-talk mailing list Ferret-talk@rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk