Re: [Ferret-talk] [Repost] Problem with url searching..

Jens Kraemer Tue, 03 Apr 2007 03:41:11 -0700

On Tue, Apr 03, 2007 at 12:04:28PM +0200, ahFeel wrote:
> Hi all,
> 
> I've posted that few weeks ago but no one answered, but this feature is
> REALLY important for us.
> 
> I have many objects with a url field, of course containing standards
> urls...
> I'm trying to match them but i actually got problems with that.


Ok, here we go:

First of all, use 

INDEX.process_query(query_string) 

to see how Ferret sees your querys after the QueryParser parsed them.

You'll see that the results ferret gives perfectly match the queries the
parser generated from your query strings - but these are not the results
you want. 

So you'll have do work on the analysis part. Here it seems your problem
is that your analyzer is stripping away the wildcards you use, i.e.

a = TestAnalyzer.new
qp = Ferret::QueryParser.new :analyzer => a
qp.parse 'url:"http://ferret.davebalmain.com";' # url:ferret.davebalmain.com
qp.parse 'url:"http://ferret*";'                # url:ferret  -> bad, won't mach

A custom URLAnalyzer that strips away the protocol://, but leaves intact
wildcards in queries could help here. You also should think about
further tokenizing the domain part by splitting at '.' (as a
LetterTokenizer would do). So url:ferret would match
the ferret.davebalmain.com url even without wildcard.

Also keep in mind that you do not have to use Ferret's Query Parser if
it doesn't fit your needs - you can always build your own.


Jens

 

-- 
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[EMAIL PROTECTED] | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] [Repost] Problem with url searching..

Reply via email to