[jira] Commented: (LUCENE-1842) Add reset(AttributeSource) method to AttributeSource

Tim Smith (JIRA) Sat, 22 Aug 2009 05:48:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746450#action_12746450
 ]


Tim Smith commented on LUCENE-1842:
-----------------------------------

Here's some pseudo code to hopefully fully show this use case:

{code}
// These guys are initialized once
Analyzer analyzer1 = new SimpleAnalyzer();
Analyzer analyzer2 = new StandardAnalyzer();
Analyzer analyzer3 = new LowerCaseAnalyzer();

// This is done on a per Field basis
Reader source1 = new StringReader("some text");
Reader source2 = new StringReader("some more text");
Reader source3 = new stringReader("final text");

TokenStream stream1 = analyzer1.reusableTokenStream(source1);
TokenStream stream2 = analyzer2.reusableTokenStream(source2);
TokenStream stream3 = analyzer3.reusableTokenStream(source3);

// Create the container for the shared attributes map
AttributeSource attrs = new AttributeSource();

// Have all streams share the same attributes map
stream1.reset(attrs);
stream2.reset(attrs);
stream3.reset(attrs);

// Create my merging TokenStream (have it use attrs as its attribute source)
TokenStream merger = new MergeTokenStreams(attrs, new TokenStream[] { stream1, 
stream2, stream3 });

/// Add a filter that will put a token prior to the source token stream, and 
after the source token stream is exhausted
TokenStream finalStream = new WrapFilter(merger, "anchor token");

// finalStream will now be passed to the indexer
{code}

Hopefully this makes this use case more clear
In order to use reusableTokenStreams from the Analyzers, the MergeTokenStreams 
must be able to share its attributes map with the underlaying TokenStreams its 
merging
otherwise, MergeTokenStreams has to do something like this in its 
incrementToken:
{code}
public boolean incrementToken() {
 if (currentStream.incrementToken()) {
    copy currentStream.termAttr into my local termAttr
    copy currentStream.offsetsAttr into my local termAttr
    return true;
  } else {
    advance currentStream to be the next stream in line
  } 
}
{code}

as opposed to:
{code}
public boolean incrementToken() {
  if (currentStream.incrementToken()) {
    // don't need to do anything (because underlying tokenstreams share the 
same attributes map as me)
    return true;
  } else {
    advance currentStream to be the next stream in line
  }
}
{code}

Hopefully this makes my use case clear

> Add reset(AttributeSource) method to AttributeSource
> ----------------------------------------------------
>
>                 Key: LUCENE-1842
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1842
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Analysis
>            Reporter: Tim Smith
>            Priority: Minor
>
> Originally proposed in LUCENE-1826
> Proposing the addition of the following method to AttributeSource
> {code}
> public void reset(AttributeSource input) {
>     if (input == null) {
>       throw new IllegalArgumentException("input AttributeSource must not be 
> null");
>     }
>     this.attributes = input.attributes;
>     this.attributeImpls = input.attributeImpls;
>     this.factory = input.factory;
> }
> {code}
> Impacts:
> * requires all TokenStreams/TokenFIlters/etc to call addAttribute() in their 
> reset() method, not in their constructor
> * requires making AttributeSource.attributes and 
> AttributeSource.attributesImpl non-final
> Advantages:
> Allows creating only a single actual AttributeSource per thread that can then 
> be used for indexing with a multitude of TokenStream/Tokenizer combinations 
> (allowing utmost reuse of TokenStream/Tokenizer instances)
> this results in only a single "attributes"/"attributesImpl" map being 
> required per thread
> addAttribute() calls will almost always return right away (will only be 
> "initialized" once per thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1842) Add reset(AttributeSource) method to AttributeSource

Reply via email to