[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742994#action_12742994 ]
Shai Erera commented on LUCENE-1794: ------------------------------------ bq. I only call reset() on streams.result when there is a state-keeping TokenFilter on that chain. it is not necessary to invoke it if reset() is a no-op... What if one of those streams will become state-keeper some day? I don't think that calling reset() and have it done nothing will be expensive, no? bq. but now my problem child is memory/PatternAnalyzer, its source of tokens is not a Tokenizer. It could just override reusableTokenStream and do what it wants, no? I must admit that I still have in mind the current TokenStream and Analyzer API. Therefore my suggestion may not be 100% compatible w/ AttributeSource and the new stuff. But my gut feeling tells me there has to be a way to remove all those unnecessary impls in all Analyzers. The following default impl seems too obvious than to say we cannot do it: {code} protected TokenStream internalTokenStream() { // do something } protected Tokenizer getTokenizer() { // do something } public TokenStream tokenStream() { TokenStream result = getTokenizer(); result = internalTokenStream(result); return result; } public TokenStream reusableTokenStream() { Streams streams = getPrevTS(); if (streams == null) { streams = new Streams(); streams.tokenizer = getTokenizer(); streams.tokenStream = internalTokenStream(streams.tokenizer); setPrevTS(streams); } else { streams.reset(); } return streams.tokenStream; } {code} I'll try to impl it tomorrow, using the new API and see how it goes. > implement reusableTokenStream for all contrib analyzers > ------------------------------------------------------- > > Key: LUCENE-1794 > URL: https://issues.apache.org/jira/browse/LUCENE-1794 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, > LUCENE-1794.patch, LUCENE-1794.patch > > > most contrib analyzers do not have an impl for reusableTokenStream > regardless of how expensive the back compat reflection is for indexing speed, > I think we should do this to mitigate any performance costs. hey, overall it > might even be an improvement! > the back compat code for non-final analyzers is already in place so this is > easy money in my opinion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org