Re: Changing Document Boosts without Reindexing

Christoph Goller Sat, 23 Oct 2004 06:41:52 -0700

Dan Climan schrieb:

I wanted to test several strategies for Document Boosting. It seems like
the only way to do this was to reindex every Document and do setBoost. This
will take a long time. I had an idea for how to do this without reindexing
and I was curious if there was a better strategy or if there were additional
points I should consider in this approach

1) Optimize the index
2) Get the internal lucene doc id for each document
3) Update the boosts
        IndexReader ir = IndexReader.open(indexDir);
        IndexSearcher searcher = new IndexSearcher(ir) ;
        Similarity sim = searcher.getSimilarity();
      Collection indexedFields = ir.getFieldNames(true);
        Iterator it = indexedFields.iterator();
        while(it.hasNext()) {
                String f = (String) it.next());
                byte[] norms = ir.norms(f);
              for (int i=0; i<numDocs; i++) {
                        float oldNorm = sim.decodeNorm(norms[i]);
                float newNorm = oldNorm * ( newDocBoost[i] /
oldDocBoost[i]);
                        norms[i]  =  sim.encodeNorm(norms[i]);
        }
        }
4) Write new norms files

Does this become prohibitively complicated using a compound file system?

Comments?

Thanks,
Dan


IndexReader has a setNorm method. It should also work for indexes with
compound files. After (re)setting the norm, a separate norms-file is
generated which will be reintegrated into the compound file after the
next optimize or merge.

Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Changing Document Boosts without Reindexing

Reply via email to