An interesting thing has come up with Plucene: The code for TermInfosReader.get has an optimisation so that in sequential access it doesn't need to keep seeking:
final synchronized TermInfo get(Term term) throws IOException { if (size == 0) return null; // optimize sequential access: first try scanning cached enum w/o seeking if (enum.term() != null // term is at or past current && ((enum.prev != null && term.compareTo(enum.prev) > 0) || term.compareTo(enum.term()) >= 0)) { int enumOffset = (enum.position/TermInfosWriter.INDEX_INTERVAL)+1; if (indexTerms.length == enumOffset // but before end of block || term.compareTo(indexTerms[enumOffset]) < 0) return scanEnum(term); // no need to seek } // random-access: must seek seekEnum(getIndexOffset(term)); return scanEnum(term); } In the Perl version, this whole middle section slows everything down considerably (by almost 50%). I'm not sure whether this is because of bottlenecks being at different places in Perl vs Java, but I'm curious as what impact this optimisation has in the Java. I can't easily test it from here at the minute, but I'm curious if there are any Benchmarks on the effect of having that optimisation vs not having it. Thanks, Tony Tony --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]