Hi all, I'm trying to implement a "stop phrases filter" with the new TokenStream API.
I would like to be able to peek into N tokens ahead, see if the current token + N subsequent tokens match a "stop phrase" (the set of stop phrases are saved in a HashSet), then discard all these tokens when they match a stop phrase, or keep them all if they don't match. For this purpose I would like to use captureState() and then restoreState() to get back to the starting point of the stream. I tried many combinations of these API. My last attempt is in the code below, which doesn't work. static private HashSet<String> m_stop_phrases = new HashSet<String>(); static private int m_max_stop_phrase_length = 0; ... public final boolean incrementToken() throws IOException { if (!input.incrementToken()) return false; Stack<State> stateStack = new Stack<State>(); StringBuilder match_string_builder = new StringBuilder(); int skippedPositions = 0; boolean is_next_token = true; while (is_next_token && match_string_builder.length() < m_max_stop_phrase_length) { if (match_string_builder.length() > 0) match_string_builder.append(" "); match_string_builder.append(termAtt.term()); skippedPositions += posIncrAtt.getPositionIncrement(); stateStack.push(captureState()); is_next_token = input.incrementToken(); if (m_stop_phrases.contains(match_string_builder.toString())) { // Stop phrase is found: skip the number of tokens // without restoring the state posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions); return is_next_token; } } // No stop phrase found: restore the stream while (!stateStack.empty()) restoreState(stateStack.pop()); return true; } Which is the correct direction I should look into to implement my "stop phrases" filter? Thank you Regards Enrico