Re: get wordno, lineno, pageno for term/phrase

arun r Fri, 06 Aug 2010 10:38:22 -0700

I am trying to create a custom analyzer that will check for pagebreak
and linebreak and add the payload data for each term. In the custom
filter I have this code:


public boolean incrementToken() throws IOException {
                
                if(input.incrementToken())
                {
                        if(termAtt.term().equals(pageBreak)){
                                System.out.println("pageBreak");
                                pageCount++;
                        }
                        else if(termAtt.term().equals(lineBreak))
                        {
                                System.out.println("lineBreak");
                                lineCount++;
                        }
                        else
                                addPayload(lineCount, pageCount);
                                
                        return true;
                }
                else            
                        return false;
        }

where pageBreak and lineBreak is defined as :
int pageBreakAscii = 12;
String pageBreak = new Character ((char) pageBreakAscii).toString();
String lineBreak = System.getProperty("line.separator");

And am using the WhitespaceAnalyzer tokenstream, which ignores the
pageBreak and lineBreak. Is there a way to create a analyzer that will
ignore the pagebreak and linebreak characters during search, but give
access to them in  incrementToken() in the filter ?
        

-- 
Where there is a will, there is a way !

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: get wordno, lineno, pageno for term/phrase

Reply via email to