Hi Uwe,

Thanks for the advice! My indexing routine is back up to speed.

If I ever make it to Bremen or near by, I definitely owe you a beer!

Kind Regards,

Luke



________________________________________
From: Uwe Schindler [u...@thetaphi.de]
Sent: 14 September 2012 11:10
To: general@lucene.apache.org
Subject: RE: Custom Filter Indexing Slow

The problem ist hat your transformation method needs Strings, but your 
incrementToken method also has a serious bug: It does not respect the length of 
the buffer, so it may hit additional garbage!


The easiest way to do this in lots less code and not having those bugs:

     public boolean incrementToken() throws IOException {
        if (!input.incrementToken()) {
            return false;
        }
        final String normalizedLCcallnum = 
getLCShelfkey(charTermAttr.toString());
        charTermAttr.setEmpty().append(normalizedLCcallnum);
        return true;
     }

This fixes part of your performance problem: It does not 2 times convert the 
result of your transformation between char arrays, Strings,..

To further improve speed, make the method getLCShelfKey directly operatate on 
char[] and length.

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk]
> Sent: Friday, September 14, 2012 11:58 AM
> To: general@lucene.apache.org
> Subject: Custom Filter Indexing Slow
>
> Hi Folks,
>
> I have a custom filter which does everything I need it to but it has reduced 
> my
> indexing speed to a crawl. Are there any methods I need to call to clear / 
> clean
> things up once my script (details below) has done it's work?
>
> Thanks,
>
> Luke
>
>   public LCCNormalizeFilter(TokenStream input)
>     {
>         super(input);
>         this.charTermAttr = addAttribute(CharTermAttribute.class);
>     }
>
>     public boolean incrementToken() throws IOException {
>
>       if (!input.incrementToken()) {
>           return false;
>       }
>
>       char[] buffer = charTermAttr.buffer();
>       String rawLCcallnum = new String(buffer);
>       String normalizedLCcallnum = getLCShelfkey(rawLCcallnum);
>       char[] newBuffer = normalizedLCcallnum.toCharArray();
>         charTermAttr.setEmpty();
>         charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
>         return true;
>     }=

Reply via email to