[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742974#action_12742974 ]
Shai Erera commented on LUCENE-1794: ------------------------------------ Robert, what I meant about pulling SavedStreams up to Analyzer (few comments above) was to do something like this: {code} class Analyzer { protected static class Streams { public Tokenizer tokenizer; public TokenStream tokenStream; } ... } class MyAnalyzer extends Analyzer { public reusableTokenStream() { Streams streams = getPrevTS(); if (streams == null) { streams = new Streams(); streams.tokenizer = new Tokenizer(); streams.tokenStream = new TokenStream(); setPrevTS(streams); } else { streams.tokenizer.reset(reader); streams.tokenStream.reset(); } return streams.tokenStream; } {code} This will just save the declaration of SavedStreams or Streams in all sub-classes. In addition we can do the following: # Define reset(String, Reader) on Streams, so that everyone just calls streams.reset(), instead of resetting tokenizer and tokenStream. Streams will do that internally. # Define a protected abstract getTokenizer() on Analyzer that all Analyzers implement. (due to back-compat, this can throw UOE - let's leave it for now). # Have Analyzer's reusableTokenStream look like the following: {code} public TokenStream reusableTokenStream(String field, Reader reader) { Streams streams = getPreviousTokenStream(); if (streams == null) { streams = new Streams(); streams.tokenizer = getTokenizer(field, reader); streams.tokenStream = tokenStream(); setPrevTS(streams); } else { streams.reset(field, reader); } return streams.tokenStream; } {code} And that can be even more simplified, by having Streams define a ctor which accepts Tokenizer and TokenStream. We can also instead of doing "new Streams()" call a method newStreams() so that sublcasses can override if they want to provide a different Streams impl. Not a must, and we might even consider the whole thing final (Streams, reusableTokenStream ? etc.) That will save some code in that patch I believe. What do you think? I haven't touched the back-compat issues yet - let's discuss the idea first. > implement reusableTokenStream for all contrib analyzers > ------------------------------------------------------- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org