Am Freitag, den 01.12.2006, 11:54 -0800 schrieb Chris Hostetter: .... > the short answer is you can't, not with the DefaultSimilarity, but you > might be able to write a custom Similarity that makes the scores > comparable by making the idf function a NOOP (of course, then your scores > won't be as "good" -- for some definition of good, but that may not matter > to you)
I thought of making the idf function a NOOP, since this is somehow one of the Ways Y. Yang wrote about in "A System For New Event Detection". The final idf estimate can be found by using a training set, which i can use. But this is not the best way, as you said, due to the score quality. The second way is: Doing an incremental idf, by updating the index everytime a new (day) archive has arrived. Then I use the whole Index for getting the idf. But for search results I WANT ONLY results of the current day to arrive. As I said, this all deals with Continous Queries and Online classification. Situation: I ve to find the best document in a period of 60 days. I m using optimal stopping theory, but this doesnt matter here. I want to query for documents for a special day, but have the idf vor all days. An Example: I ve got an Index of day 1. The idf can be calculated based on it. My Query System doesn't want to mark a document as relevant. So now all documents are marked unrelevant and can't be taken as relevant ones anymore. Now on the second day ive got a new version of my web archive. I will "merge" Index of day 1 with the new index/documents to get a global idf (called incremental idf). For my classification I'm only allowed to classify documents of day2. Now how can I query the index (the query is everyday the same) and get only documents of day 2, but using the score function (with idf and so number of document = all documents day 1 + 2). I hope you understand my problem. Is it possible to search a lucene index and get only documents of a certain day (or the last added documents) but using a global idf the global index. This would be also the solution of my problem. Thanks Nils > > this FAQ entry, and in particular the thread from teh second link in it > have a *lot* of more info on this... > http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03 > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]