The problem ist hat your transformation method needs Strings, but your incrementToken method also has a serious bug: It does not respect the length of the buffer, so it may hit additional garbage!
The easiest way to do this in lots less code and not having those bugs: public boolean incrementToken() throws IOException { if (!input.incrementToken()) { return false; } final String normalizedLCcallnum = getLCShelfkey(charTermAttr.toString()); charTermAttr.setEmpty().append(normalizedLCcallnum); return true; } This fixes part of your performance problem: It does not 2 times convert the result of your transformation between char arrays, Strings,.. To further improve speed, make the method getLCShelfKey directly operatate on char[] and length. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk] > Sent: Friday, September 14, 2012 11:58 AM > To: general@lucene.apache.org > Subject: Custom Filter Indexing Slow > > Hi Folks, > > I have a custom filter which does everything I need it to but it has reduced > my > indexing speed to a crawl. Are there any methods I need to call to clear / > clean > things up once my script (details below) has done it's work? > > Thanks, > > Luke > > public LCCNormalizeFilter(TokenStream input) > { > super(input); > this.charTermAttr = addAttribute(CharTermAttribute.class); > } > > public boolean incrementToken() throws IOException { > > if (!input.incrementToken()) { > return false; > } > > char[] buffer = charTermAttr.buffer(); > String rawLCcallnum = new String(buffer); > String normalizedLCcallnum = getLCShelfkey(rawLCcallnum); > char[] newBuffer = normalizedLCcallnum.toCharArray(); > charTermAttr.setEmpty(); > charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length); > return true; > }=