I am trying to create a custom analyzer that will check for pagebreak and linebreak and add the payload data for each term. In the custom filter I have this code:
public boolean incrementToken() throws IOException { if(input.incrementToken()) { if(termAtt.term().equals(pageBreak)){ System.out.println("pageBreak"); pageCount++; } else if(termAtt.term().equals(lineBreak)) { System.out.println("lineBreak"); lineCount++; } else addPayload(lineCount, pageCount); return true; } else return false; } where pageBreak and lineBreak is defined as : int pageBreakAscii = 12; String pageBreak = new Character ((char) pageBreakAscii).toString(); String lineBreak = System.getProperty("line.separator"); And am using the WhitespaceAnalyzer tokenstream, which ignores the pageBreak and lineBreak. Is there a way to create a analyzer that will ignore the pagebreak and linebreak characters during search, but give access to them in incrementToken() in the filter ? -- Where there is a will, there is a way ! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org