Ahhh, I was assuming you didn't need to look at all clusters. Oops. That said, the question is really whether this is "good enough" compared to re-indexing, and only some tests will determine that. I was surprised at how quickly a *large* number of ORs was processed by Lucene.
You could also think about implementing a HitCollector that boosted the raw score of each document based upon the cluster ID, but be careful not to read the full document in the HitCollector (you shouldn't have to though, either make a map early or get creative with filters). You might find useful information looking through the mail archive for "faceting", as this seems like a similar topic. But I wouldn't go anywhere with anything custom until and unless I'd satisfied myself that the simple approach of letting Lucene handle a large set of OR clauses wasn't performant. Several very bright people put significant effort in to performance, I'd see if they've already done the hard part <G>..... Erick On 8/21/07, Raghu Ram <[EMAIL PROTECTED]> wrote: > > do you mean to say that we generate a compound query by AND ing the > original > query with a query like > > ( (cluster_id=0)^boost_cluster0 OR (cluster_id=1)^boost_cluster1...) ) > > But is this not inefficient considering that the number of clusters is in > hundreds ?????? > > > > > > On 8/21/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > > > One solution is to keep meta-data in your index. Remember that > > documents do not all have to have the same field. So you could > > index a document with a single field > > "metadatanotafieldinanyotherdoc" that contains, say, a list of > > all of your clusters and their boosts. Read this document in at > > startup time and cache it away in your server. Thereafter, you have > > a set of boosts that can be applied at query time. > > > > Of course this useless if you wanted to boost at index time. > > But I know of no way to change the boost of a document > > without deleting and readding it with the new boost. > > > > Best > > Erick > > > > On 8/21/07, Raghu Ram <[EMAIL PROTECTED]> wrote: > > > > > > Is it possible to have multiple documents share a common boost? > > > > > > An example scenario is as follows. The set of documents are clustered > > into > > > some set of clusters. Each cluster has a unique clusterId. So each > > > document > > > has a cluster Id field that associates each document with its cluster. > > > Each > > > cluster has a property called cluster score. Each document has to be > > > boosted > > > by its cluster score. The number of clusters is very small in > comparison > > > to > > > the number of documents (around 100 clusters).The cluster score is > > updated > > > on a continual basis. So the cluster score cant be stored as the > > document > > > boost for each individual document as we end up updating all the > > documents > > > boost daily which seems infeasible. We are trying to find out a > solution > > > that is more efficient. > > > > > > Thank you. > > > > > >