Robert Pohl wrote:
Sean, I was thinking about something like what you describe:

I index the articles in realtime basically, so the dates are pretty much "right now". But can I compare the dates to a start date such as 2000-01-01 and set the boost to the diffenrence between the dates? This will make the boost number higher as time goes but will that be a problem?
A cunning idea. I don't see why it should be a problem except that if you are applying other boosts at index it would change the relative importance of the date component. I can't figure out if this would also apply if you were applying other boosts at query time. I would certainly set the per day boost value to a fairly small fraction. You would need to experiment to see how much difference you need between older and newer docs.
Moray: Will the range queries affect the performance?

Short answer is yes, but how much depends on how many documents you are indexing and probably more importantly the size of the result sets your basic search terms are retrieving. So if a typical search returns only 1000 documents out of 1 million then the range queries should make little or no difference. If a typical search returns 900 000 out of 1 million then they will make a substantial difference.

In our case the Lucene return time is so relatively small compared to overall time to display results that an increase makes negligible difference to end users.

Lets say that I want to boost all articles that are one week old with 5 and all that are one month old with 2 and leave all the rest.
So this example would leave you with two range queries over a relatively small range - you should experiment, but I don't think it's a big problem.
Thanks for your input!

You're welcome.
//Rob




Sean Carpenter wrote:
Rob,
We use the opposite approach and use a lower boost value during
indexing for older documents (which makes newer ones score higher).

When adding the document to the index, you can call Document.SetBoost
(http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/document/Document.html#setBoost(float))
to set the overall document's boost factor.  We use a pre-defined
scale based on the age of the document something like: less than 3
months = boost 1, 3 - 6 months = boost 0.8, 6 - 12 months = boost 0.4,
etc.

Sean

On Tue, Jul 21, 2009 at 10:50 AM, Robert Pohl<[email protected]> wrote:
Hi,

I have a lot of articles indexed with title, body and date.
How can I boost the dates so that the most recent articles have higher score
than the older ones?

Thanks,
Rob






Reply via email to