Hi chris,


I've really only had a chnce to skim this thread so far, but if i
understand correctly, the goal is to get documents back in a "blended"
order based on:
  1) textual relevancy to the search input
  2) recentness
3) a mapping of field values to arbitrary numeric weights which need to
     be specified at query time (ie: score collection:A better then
     collection:C better then collectoin:Q etc...)


You have perfectly understood my question, thanks for trying to help!

In that case i think a "function query" is the way to go ... I haven't relaly had a chance to catch up on the way the Solr FunctionQuery class morphed when it was adopted into the Lucene core, but i believe all the relevent pieces are in the org.apache.lucene.search.function package, and
it seems to have some good package level javadocs...

Thats what i discovered. The question is : Is the ValueSourceQuery strong and fast enough to be used confidently in a production environment? I looked at the source code and it seem spretty straightforward, so I would say yes, as long as i use the caches correctly. Can you confirm?



http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ javadoc/org/apache/lucene/search/function/package-summary.html

You seemed to be on the right track asking about ValueSourceQuery ... but
thta's only part of hte puzzle: for the "recentness" aspect a
ValueSourceQuery composed on a ReverseOrdFieldSource should take care of things ... but the arbitrary weighting by "collection" will really require you to provide your own ValueSource implementation -- most likely you'll want to leverage the FieldCache, but map your "collectionIds" (whatever
they are) to the numeric values you want to use.

then you'll have all the pieces, the only thing left to do will be to
decide if you want to combine them with a regular BooleanQuery or use a
CustomScoreQuery.

Yes, I will have to implement my own ValueSource, but it seems it'really not complicated, looking at the existing
ValueSource implementations.


As for your comments about "random scoring" ... this is really, Really, REALLY hard to get "right" for a variety of reasons that i don't really
want to go into right now ... my advice: don't attempt to commit to
"random" ordering.   Instead commit to promoting N randomly selected
documents to the front of the results ... this is easy to do by writting a
custom query (again ValueSourceQuery can probably help you) where you
pick N random numbers between 0 and maxDoc and score them really high ...
then let the rest of the docs score as they normally would.

What's wrong with this idea :
Each day i generate an shuffle a vector of Maxdoc integers from 0 to Maxdoc.

Then i use a valueSource query with a valueSource that uses this vector to randomly score the documents. Of course I have to somehow normalize those random scores so that their "contribution factor" remains constant when MaxDocs increases.


Thanks for your advices !


Antoine

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to