Per-token weighting / attribute data in index

Scott Davies Fri, 02 Jun 2006 13:15:02 -0700

Hi...reasonably experienced web search programmer but total Lucene newbie here.


After poking through Lucene for a while, I still haven't figured out a
decent way to tweak the scoring based on per-token data.  For example,
as far as I can tell so far, the only reasonable way to have words in
the titles or headers of HTML documents be "worth more" for scoring
purposes than ordinary body text is to make "title" and "header"
fields and apply appropriate field boosts across all documents.  That
works OK if you only have a few special fields you want to boost by
some consistent amount each, but falls down if, say, you wanted to
include some sort of "tags" or anchortext in the scoring of documents
where there's a high degree of variability in how much any given tag
or anchor should be "trusted" and thus influence the score.  (I could
conceivably discretize the boosts and, say, put all the anchortext
with boost 2.5 in a special "anchortext-boost2.5" field, but that
would be extremely awkward and presumably cause major performance
issues as the number of fields increases.)

Have I just failed to notice the right way to do this, or is there
really no decent way to do it in Lucene at this time?  If the latter,
are there any plans to add this feature at some point semi-soon?  This
seems to me like a major scoring limitation for applications not just
indexing and searching over plain text documents...

Thanks,

-- Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Per-token weighting / attribute data in index

Reply via email to