Re: Per-token weighting / attribute data in index

Scott Davies Fri, 02 Jun 2006 14:54:48 -0700

A simple example would be indexing and scoring the hyperlink text from
other web pages that point to the page P that I'm indexing/scoring.  I
might have some metric saying how much I "trust" each of the pages or
sites with hyperlinks to P, and want to use that metric to increase or
decrease how much the text in those hyperlinks increases the score of
P for queries containing that anchortext.  Since each incoming
hyperlink is from a different site with a different trustworthiness,
I'd obviously want to be able to vary that boost independently for
every different hyperlink pointing at page P.


On 6/2/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:


i may be missunderstanding your goal .. it sounds like what you want to do
is say thta for certain documents (which you trust) matching on the title
is "worth more" then matching on the title of other documents (which you
don't trust)

if that' the case, then at index time you can add field boost on the
title just for hte documents you trust, and add no boost for hte documents
you don't trust.

I've i've missunderstood your question, could you provide a use case
describing your goal, and where lucene fails to meet it?



: Date: Fri, 2 Jun 2006 13:14:41 -0700
: From: Scott Davies <[EMAIL PROTECTED]>
: Reply-To: [email protected]
: To: [email protected]
: Subject: Per-token weighting / attribute data in index
:
: Hi...reasonably experienced web search programmer but total Lucene newbie 
here.
:
: After poking through Lucene for a while, I still haven't figured out a
: decent way to tweak the scoring based on per-token data.  For example,
: as far as I can tell so far, the only reasonable way to have words in
: the titles or headers of HTML documents be "worth more" for scoring
: purposes than ordinary body text is to make "title" and "header"
: fields and apply appropriate field boosts across all documents.  That
: works OK if you only have a few special fields you want to boost by
: some consistent amount each, but falls down if, say, you wanted to
: include some sort of "tags" or anchortext in the scoring of documents
: where there's a high degree of variability in how much any given tag
: or anchor should be "trusted" and thus influence the score.  (I could
: conceivably discretize the boosts and, say, put all the anchortext
: with boost 2.5 in a special "anchortext-boost2.5" field, but that
: would be extremely awkward and presumably cause major performance
: issues as the number of fields increases.)
:
: Have I just failed to notice the right way to do this, or is there
: really no decent way to do it in Lucene at this time?  If the latter,
: are there any plans to add this feature at some point semi-soon?  This
: seems to me like a major scoring limitation for applications not just
: indexing and searching over plain text documents...
:
: Thanks,
:
: -- Scott
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Per-token weighting / attribute data in index

Reply via email to