I was looking(in process of making my own) into solr's default clustering component for carrot2. In the clustering component class there are 2 methods where the clustering algorithms are called:
in the overridden process method SolrDocumentList solrDocList = SolrPluginUtils.docListToSolrDocumentList( results.docList, rb.req.getSearcher(), engine.getFieldsToLoad(rb.req),docIds); Object clusters = engine.cluster(rb.getQuery(), solrDocList, docIds, rb.req); rb.rsp.add("clusters", clusters); And once again in the finishStage method Map<SolrDocument,Integer> docIds = null; Object clusters = engine.cluster(rb.getQuery(), solrDocList, docIds, rb.req); rb.rsp.add("clusters", clusters); Now my question is the process method works not on the complete result query but on the shards and finish stage once when all the results have been aggregated, then why does we call the clustering algorithms twice and adding it to the resulted cluster? Am I missing something? Wont it create too many labels if in the worst case none of the cluster labels match? P.S Please correct me if I am wrong. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-in-understanding-solr-clustering-component-tp4334400.html Sent from the Solr - User mailing list archive at Nabble.com.