Thanks for the context - much more useful.
The challenge here is similar to that posed by offering end-user tagging of 
content (see here 
http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ). The 
main difference here being that words are added to docs implicitly by search 
click-throughs rather than any explicit tagging action.

In both cases the challenge is that the user data around documents is likely to 
be updated very often while the documents remain relatively static. 
I suspect some additional things to think about are:
1) Cancelling out the "human laziness" bias that favours clicking results on 
page 1. Are clicks on page 2 worth more?
2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
3) Lucene doc IDs are not stable - how will you associate query terms/click 
data with documents and join them at speed?
4) Are individual words or phrases the unit of boost? "Paris" means different 
things in "Paris Hilton" and "Paris, France".

A simple approach might be to re-index your content with all of the additional 
search terms from clicks added to the associated document in a "searchClicks" 
field - the more clicks, the more repetitions of the same search words in the 
document to help with tf (Term Frequency). This additional content would need 
to be capped, to avoid huge documents. This has the disadvantage of requiring a 
re-index though. 
Another option to avoid reindexing everything is to wrap IndexReader (See 
FilterIndexReader) and implement TermEnum/TermDocs for a fake field called 
"searchClicks". The idea is Lucene looks after the usual, static document 
content while your implementation goes off to your more volatile storage (e.g. 
database/parallel index, custom file structure) to retrieve lists of doc ids, 
term frequencies etc. for this "searchClicks" field. All of the Lucene queries 
you might want to throw at this e.g. PhraseQueries can then test both the 
static Lucene fields and your new volatile "click" fields without being aware 
of this low-level trickery. 

I'm sure there will be other ways of doing this too but this seems like a 
conceptually clean way of modelling it - just seeing search terms as extensions 
to the document content.

Cheers
Mark


----- Original Message ----
From: sumittyagi <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Sunday, 23 December, 2007 5:30:55 AM
Subject: Re: Which file in the lucene package is used to manipulate results..


Actually what i have to do is...
1.) for every query(keyword), among the results obtained, the keyword
 will
be mapped with the page clicked, along with the no. of clicks for that
keyword on that page
2.) next time for the same query(keyword), the mapped pages will be
 ranked
higher considering the no. of clicks too..
3.) for every new query these steps will be repeated...
this was a very high level view , i have made algorithms for these
 modules
and trying to incorporate with lucene but dont know , on which files i
 have
to do edition to make it work...
please help me regarding this, if you need some more explanation,
 please let
me know...
thanks 
Sumit Tyagi





Erick Erickson wrote:
> 
> You still haven't explained *why* you want to rerank results. What
> is the use-case you're trying to implement? Quite often it's turned
> out for me that when I let folks on the list know what the use
> case I'm trying to support is, they come up with much more elegant
> solutions than I was thinking about.
> 
> For instance, does the CustomScoreQuery class have any relevance
> to your problem?
> 
> If you're thinking of modifying the core Lucene code for your
> special purpose, I'd advise against it unless and until you'd
 exhausted
> all the other options. It's always a maintenance headache to do this.
> 
> Best
> Erick
> 
> On Dec 21, 2007 10:09 AM, sumittyagi <[EMAIL PROTECTED]> wrote:
> 
>>
>> actually i am writing a module to rerank the results, so i want to
 edit
>> the
>> file which arrange the results and give them ranks,
>> or is there any other way i can use my module to rerank the results
>>
>>
>> markharw00d wrote:
>> >
>> > I think you need to describe your "factors" in more detail.
 Exactly
>> what
>> > do you want to achieve for your users?
>> > We could be talking about any number of Lucene functions here.
>> >
>> > ----- Original Message ----
>> > From: sumittyagi <[EMAIL PROTECTED]>
>> > To: java-user@lucene.apache.org
>> > Sent: Friday, 21 December, 2007 4:51:09 AM
>> > Subject: Which file in the lucene package is used to manipulate
>> results..
>> >
>> >
>> > hi, i am using lucene for the very first time and want to
 manipulate
>> >  the
>> > results, by adding some more factors to it, which file should i
 edit to
>> > manipulate the search results....
>> >
>> > Thanks
>> > Sumit Tyagi
>> > --
>> > View this message in context:
>> >
>> >
>>
 
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
>> > Sent from the Lucene - Java Users mailing list archive at
 Nabble.com.
>> >
>> >
>> >
>> >
>> >
>> >       __________________________________________________________
>> > Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>> >
>> >
>> >
 ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > For additional commands, e-mail: [EMAIL PROTECTED]
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>>
 
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
>> Sent from the Lucene - Java Users mailing list archive at
 Nabble.com.
>>
>>
>>
 ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 

-- 
View this message in context:
 
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to