[ https://issues.apache.org/jira/browse/LUCENE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746450#action_12746450 ]
Tim Smith commented on LUCENE-1842: ----------------------------------- Here's some pseudo code to hopefully fully show this use case: {code} // These guys are initialized once Analyzer analyzer1 = new SimpleAnalyzer(); Analyzer analyzer2 = new StandardAnalyzer(); Analyzer analyzer3 = new LowerCaseAnalyzer(); // This is done on a per Field basis Reader source1 = new StringReader("some text"); Reader source2 = new StringReader("some more text"); Reader source3 = new stringReader("final text"); TokenStream stream1 = analyzer1.reusableTokenStream(source1); TokenStream stream2 = analyzer2.reusableTokenStream(source2); TokenStream stream3 = analyzer3.reusableTokenStream(source3); // Create the container for the shared attributes map AttributeSource attrs = new AttributeSource(); // Have all streams share the same attributes map stream1.reset(attrs); stream2.reset(attrs); stream3.reset(attrs); // Create my merging TokenStream (have it use attrs as its attribute source) TokenStream merger = new MergeTokenStreams(attrs, new TokenStream[] { stream1, stream2, stream3 }); /// Add a filter that will put a token prior to the source token stream, and after the source token stream is exhausted TokenStream finalStream = new WrapFilter(merger, "anchor token"); // finalStream will now be passed to the indexer {code} Hopefully this makes this use case more clear In order to use reusableTokenStreams from the Analyzers, the MergeTokenStreams must be able to share its attributes map with the underlaying TokenStreams its merging otherwise, MergeTokenStreams has to do something like this in its incrementToken: {code} public boolean incrementToken() { if (currentStream.incrementToken()) { copy currentStream.termAttr into my local termAttr copy currentStream.offsetsAttr into my local termAttr return true; } else { advance currentStream to be the next stream in line } } {code} as opposed to: {code} public boolean incrementToken() { if (currentStream.incrementToken()) { // don't need to do anything (because underlying tokenstreams share the same attributes map as me) return true; } else { advance currentStream to be the next stream in line } } {code} Hopefully this makes my use case clear > Add reset(AttributeSource) method to AttributeSource > ---------------------------------------------------- > > Key: LUCENE-1842 > URL: https://issues.apache.org/jira/browse/LUCENE-1842 > Project: Lucene - Java > Issue Type: Wish > Components: Analysis > Reporter: Tim Smith > Priority: Minor > > Originally proposed in LUCENE-1826 > Proposing the addition of the following method to AttributeSource > {code} > public void reset(AttributeSource input) { > if (input == null) { > throw new IllegalArgumentException("input AttributeSource must not be > null"); > } > this.attributes = input.attributes; > this.attributeImpls = input.attributeImpls; > this.factory = input.factory; > } > {code} > Impacts: > * requires all TokenStreams/TokenFIlters/etc to call addAttribute() in their > reset() method, not in their constructor > * requires making AttributeSource.attributes and > AttributeSource.attributesImpl non-final > Advantages: > Allows creating only a single actual AttributeSource per thread that can then > be used for indexing with a multitude of TokenStream/Tokenizer combinations > (allowing utmost reuse of TokenStream/Tokenizer instances) > this results in only a single "attributes"/"attributesImpl" map being > required per thread > addAttribute() calls will almost always return right away (will only be > "initialized" once per thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org