For the MemoryIndex, I'm seeing large performance overheads due to repetitive temporary string interning of o.a.l.index.Term.
For example, consider a FuzzyTermQuery or similar, scanning all terms via TermEnum in the index: 40% of the time is spent in String.intern () of new Term(). [Allocating temporary memory and FuzzyTermEnum.termCompare are less of a problem according to profiling].


Note that the field name would only need to be interned once, not time and again for each term. But the non-iterning Term constructor is private and hence not accessible from o.a.l.index.memory.*. TermBuffer isn't what I'm looking for, and it's private anyway. The best solution I came up with is to have an additional safe public method in Term.java:

/** Constructs a term with the given text and the same interned field name as
* this term (minimizes interning overhead). */
public Term createTerm(String txt) { // WH
return new Term(field, txt, false);
}


Besides dramatically improving performance, this has the benefit of keeping the non-interning constructor private.
Comments/opinions, anyone?


Here's a sketch of how it can be used:

public Term term() {
...
if (cachedTerm == null) cachedTerm = new Term ((String) sortedFields[j].getKey(), "");
return cachedTerm.createTerm((String) info.sortedTerms[i].getKey());
}


public boolean next() {
                    ...
                    if (...) cachedTerm = null;
}

I'll send the full patch for MemoryIndex if this is accepted.

Wolfgang.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to