[jira] [Created] (LUCENE-6987) Clarify TokenStream workflow documentation

Steve Rowe (JIRA) Wed, 20 Jan 2016 22:25:02 -0800

Steve Rowe created LUCENE-6987:
----------------------------------

             Summary: Clarify TokenStream workflow documentation
                 Key: LUCENE-6987
                 URL: https://issues.apache.org/jira/browse/LUCENE-6987
             Project: Lucene - Core
          Issue Type: Task
            Reporter: Steve Rowe



On SOLR-4619, [~rcmuir] noted:

According to TokenStream's class javadocs:

{quote}
The workflow of the new TokenStream API is as follows:

1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from 
the AttributeSource.
2. The consumer calls reset().
3. The consumer retrieves attributes from the stream and stores local 
references to all attributes it wants to access.
{quote}

So we have consumers (such as QueryBuilder) doing stuff out of order: if they 
do step 3 before they do step 2.

My question is, can we detect this in tests? If MockAnalyzer can enforce it, it 
is easier to fix it consistently everywhere. One idea is if MockTokenizer 
deferred initializing its attributes until reset()? Its not going to be the 
best (we need to tie it into its state machine logic somehow for that), but it 
might be an easy step.

Also, majority of TokenFilters (which basically also serve as consumers too), 
are doing step 3 before step 2 today. Most of them are just assigning to final 
variables in their constructor.

So something is off: we gotta go one of two ways. Either fix the documentation 
to swap step 3 before step 2 \[...], or we make a massive change to tons of 
tokenizers (making them more complex and less efficient).
But I think we have to do something, at least we should fix the docs to be 
clear, they need to reflect reality.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-6987) Clarify TokenStream workflow documentation

Reply via email to