Hi, I am running a search for something akin to a news site, when each news document has a date, title, keywords/bylines, summary fields and then the actual content. Using Lucene for this database of documents, it seems that:
1. The relevancy score is skewed drastically by the actual number of news document per day. For example if I RangedQuery on this week, and there were 100 news on Monday and two news on Sunday, the two on Sunday gets ranked highly due to idf. How do I reduce this skewness due to the date-posted field? I saw a reference earlier to ConstantScoreRangeQuery on JIRA - is it the solution? 2. If I choose to sort the results by date, then recent documents with very very low relevancy (say the words searched appears only in content, and not in title/bylines/summary fields that are boosted higher) are still shown relatively high in the list, and I wish to omit them in general. What is the best way to implement some sort of a relevancy filter (include only documents with an normalized score of 0.2 or more....)? Or is there a better way around it? Thanks :) Best Regards, CW --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]