Hi...reasonably experienced web search programmer but total Lucene newbie here.
After poking through Lucene for a while, I still haven't figured out a decent way to tweak the scoring based on per-token data. For example, as far as I can tell so far, the only reasonable way to have words in the titles or headers of HTML documents be "worth more" for scoring purposes than ordinary body text is to make "title" and "header" fields and apply appropriate field boosts across all documents. That works OK if you only have a few special fields you want to boost by some consistent amount each, but falls down if, say, you wanted to include some sort of "tags" or anchortext in the scoring of documents where there's a high degree of variability in how much any given tag or anchor should be "trusted" and thus influence the score. (I could conceivably discretize the boosts and, say, put all the anchortext with boost 2.5 in a special "anchortext-boost2.5" field, but that would be extremely awkward and presumably cause major performance issues as the number of fields increases.) Have I just failed to notice the right way to do this, or is there really no decent way to do it in Lucene at this time? If the latter, are there any plans to add this feature at some point semi-soon? This seems to me like a major scoring limitation for applications not just indexing and searching over plain text documents... Thanks, -- Scott --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]