Hello,

We have a large collection of documents that consists of multiple balanced 
shards. Now each shard is quickly approaching its limit. Therefore, we would 
like to explore the possibility of adding unbalanced shards into the mix. 
However, that means the IDF and Relevance would take a hit. 

Several days ago, I asked about relevance across unbalanced shards in IRC 
channel #lucene. Somebody pointed me to a SOLR Jira about distributed IDF 
(SOLR-1632).

After some thinking and research, I found out that there are some new Lucene 4 
features that may help on unifying IDF across shards by calculating docFreq 
across shards at the index time. Then at the query time, we can supply/modify 
the TermStatistics in the IndexSearcher. I'm doing some experiments on this 
approach. 

Now, the question is, is that really a good thing to try?

Best Regards,

Jerry Zhou

Reply via email to