Robert Pohl wrote:
Sean, I was thinking about something like what you describe:
I index the articles in realtime basically, so the dates are pretty
much "right now". But can I compare the dates to a start date such as
2000-01-01 and set the boost to the diffenrence between the dates?
This will make the boost number higher as time goes but will that be a
problem?
A cunning idea. I don't see why it should be a problem except that if
you are applying other boosts at index it would change the relative
importance of the date component. I can't figure out if this would also
apply if you were applying other boosts at query time. I would certainly
set the per day boost value to a fairly small fraction. You would need
to experiment to see how much difference you need between older and
newer docs.
Moray: Will the range queries affect the performance?
Short answer is yes, but how much depends on how many documents you are
indexing and probably more importantly the size of the result sets your
basic search terms are retrieving. So if a typical search returns only
1000 documents out of 1 million then the range queries should make
little or no difference. If a typical search returns 900 000 out of 1
million then they will make a substantial difference.
In our case the Lucene return time is so relatively small compared to
overall time to display results that an increase makes negligible
difference to end users.
Lets say that I want to boost all articles that are one week old with
5 and all that are one month old with 2 and leave all the rest.
So this example would leave you with two range queries over a relatively
small range - you should experiment, but I don't think it's a big problem.
Thanks for your input!
You're welcome.
//Rob
Sean Carpenter wrote:
Rob,
We use the opposite approach and use a lower boost value during
indexing for older documents (which makes newer ones score higher).
When adding the document to the index, you can call Document.SetBoost
(http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/document/Document.html#setBoost(float))
to set the overall document's boost factor. We use a pre-defined
scale based on the age of the document something like: less than 3
months = boost 1, 3 - 6 months = boost 0.8, 6 - 12 months = boost 0.4,
etc.
Sean
On Tue, Jul 21, 2009 at 10:50 AM, Robert Pohl<[email protected]> wrote:
Hi,
I have a lot of articles indexed with title, body and date.
How can I boost the dates so that the most recent articles have
higher score
than the older ones?
Thanks,
Rob