Dear Michael,
I writed a tool OptimizeIndex.java, this is faster and there aren't
questions: what it is do?
After you optimize index with IndexOptimizer, the number of searching
for 'http' is the same?
Regards,
Ferenc
Michael Nebel wrotte:
Hi,
I fixed the problem with the following patch:
--- IndexOptimizer.java 2005-08-04 12:55:54.000000000 +0200
+++ IndexOptimizer.java.~1.6.~ 2005-01-21 00:48:50.000000000 +0100
@@ -138,7 +138,7 @@
if (score > minScore) {
sdq.put(new ScoreDoc(doc, score));
- if (sdq.size() >= count) { // if sdq overfull
+ if (sdq.size() > count) { // if sdq overfull
sdq.pop(); // remove lowest in
sdq
minScore = ((ScoreDoc)sdq.top()).score; // reset minScore
}
My index shrinked from 8.5 GB to 0.5 GB. I found no documentation
about the background of this tool. Can anyone tell me, what's the idea
behind?
Regards
Michael
Andy Liu wrote:
I believe this tool is unfinished and unsupported.
On 7/22/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I found an IndexOptimzer in nutch.
When I run it, it dorps an exception:
....
Optimizing url:http from 226957 to 22696
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
22697
at
org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:46)
at
org.apache.nutch.indexer.IndexOptimizer$OptimizingTermPositions.seek(IndexOptimizer.java:153)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:325)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:296)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:270)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:234)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:578)
at
org.apache.nutch.indexer.IndexOptimizer.optimize(IndexOptimizer.java:215)
at
org.apache.nutch.indexer.IndexOptimizer.main(IndexOptimizer.java:235)