Dear Michael,

I writed a tool OptimizeIndex.java, this is faster and there aren't questions: what it is do? After you optimize index with IndexOptimizer, the number of searching for 'http' is the same?

Regards,
   Ferenc

Michael Nebel wrotte:

Hi,

I fixed the problem with the following patch:

--- IndexOptimizer.java 2005-08-04 12:55:54.000000000 +0200
+++ IndexOptimizer.java.~1.6.~  2005-01-21 00:48:50.000000000 +0100
@@ -138,7 +138,7 @@

         if (score > minScore) {
           sdq.put(new ScoreDoc(doc, score));
-          if (sdq.size() >= count) {               // if sdq overfull
+          if (sdq.size() > count) {               // if sdq overfull
sdq.pop(); // remove lowest in sdq
             minScore = ((ScoreDoc)sdq.top()).score; // reset minScore
           }

My index shrinked from 8.5 GB to 0.5 GB. I found no documentation about the background of this tool. Can anyone tell me, what's the idea behind?

Regards

    Michael



Andy Liu wrote:

I believe this tool is unfinished and unsupported.

On 7/22/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

I found an IndexOptimzer in nutch.
When I run it, it dorps an exception:
....
Optimizing url:http from 226957 to 22696
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 22697 at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:46)
       at
org.apache.nutch.indexer.IndexOptimizer$OptimizingTermPositions.seek(IndexOptimizer.java:153)
       at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:325)
       at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:296)
       at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:270)
       at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:234)
       at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
       at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:578)
       at
org.apache.nutch.indexer.IndexOptimizer.optimize(IndexOptimizer.java:215)
       at
org.apache.nutch.indexer.IndexOptimizer.main(IndexOptimizer.java:235)




Reply via email to