[another part of a private thread]

Bill Moseley wrote:
At 11:32 AM 02/07/02 -0600, Randy Kobes wrote:

Hi,
  I'm not sure this would help in deciding how, or what,
to index, but I've attached a listing of the queries made
on our search of the guide over the last couple of weeks - the
number at the end of the line indicates how many times
the query on that line was searched for.


Thanks Randy,

I'm not sure I saw any perl code in my quick look.   Is that because you
filter or are people just not searching for perl code very often?

In our logs we often see that multi word searches are often phrases,
although they don't use a phrase search. So to improve results, we are
going to try to bias rank based on how close words are together -- so a
phrase, even if it's not a real phrase query (a query in quotes), will rank
high.


This should have two benefits:

1) people that search for phrases without using quotes will still get their
phrase hits first without the need for using quotes.

2) the default boolean operator can be OR instead of AND, but naturally,
the AND results will rank higher.  This is useful for swish, since swish is
often use on small sets of files where it's not unlikely that a multiword
AND search will return no results.

So if someone searches for *internal redirect*, the docs with the phrase
"internal redirect" will be ranked highest, next will be docs with both
internal and redirect, but not in a phrase (but still and AND), and lastly,
docs with just one of the words (OR search).

To complicate things, swish also ranks by "structure" or where in the HTML
source the document is (title ranks very high, <H1*> and <em> also effect
the rank), and also ranking by metaname (which is a way to categorize
search words: e.g. search subject, author, body).

If you have any suggestions on how to roll all that into a ranking
equation, I'm all ears!  I've tried to get help from UC Berkeley, but no
takers yet.  My current plan is to make some basic assumptions, but then
make all the bias adjustments configurable.

+1 from Randy and me.
Would Captain Bill the Search please submit the patches implementing the documented above behavior and the definition of words per previous email? Thanks!


_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to