This works and I can reuse token streams. But why TokenStream.reset() does not work which was in my earlier case. Is this a marker method in TokenStream without implementation and CachingTokenFilter implements the method. - BR
Mark Miller <[EMAIL PROTECTED]> wrote: reset is optional. StandardAnalyzer does not implement it. Check out CachingTokenFilter and wrap StandardAnalzyer in it. Cool Coder wrote: > Currently I have extended StandardAnalyzer and counting tokens in the > following way. But the index is not getting created , though I call > tokenStream.reset(). I am not sure whether reset() on token stream works or > not??? I am debugging now > > public TokenStream tokenStream(String fieldName, Reader reader) { > TokenStream result = super.tokenStream(fieldName,new HTMLStripReader(reader)); > //To count tokens and put in a Map > analyzeTokens(result); > try { > result.reset(); > } catch (IOException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > return result; > } > > public void analyzeTokens(TokenStream result) > { > try { > Token token = result.next(); > while(token != null) > { > String tokenStr = token.termText(); > if(TokenHolder.tokenMap.get(tokenStr) == null) > { > TokenHolder.tokenMap.put(tokenStr,1); > } > else > { > TokenHolder.tokenMap.put(tokenStr,Integer.parseInt(TokenHolder.tokenMap.get(tokenStr).toString())+1); > } > token = result.next(); > > } > //exxtra reset > result.reset(); > } catch (IOException e) { > e.printStackTrace(); > } > } > > > Karl Wettin wrote: > > 1 nov 2007 kl. 18.09 skrev Cool Coder: > > >> prior to adding into index >> > > Easiest way out would be to add the document to a temporary index and > extract the term frequency vector. I would recommend using MemoryIndex. > > You could also tokenize the document and pass the data to a > TermVectorMapper. You could consider replacing the fields of the > document with CachedTokenStreams if you got the RAM to spare and > don't want to waste CPU analyzing the document twice. I welcome > TermVectorMappingChachedTokenStreamFactory. Even cooler would be to > pass code down the IndexWriter.addDocument using a command pattern or > something, allowing one to extend the document at the time of the > analysis. > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com