[jira] [Commented] (LUCENE-4489) improve LimitTokenCountFilter and/or it's tests

Hoss Man (JIRA) Wed, 17 Oct 2012 16:34:07 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478494#comment-13478494
 ]


Hoss Man commented on LUCENE-4489:
----------------------------------

In SOLR-3961 rmuir and i were discussing the tests for LimitTokenCountFilter.  
it seems to me like there are at least 2 things that should be improved in the 
tests, and one possible improvement to the code itself...

A) fix TestLimitTokenCountAnalyzer to use MockTokenizer...
{quote}

bq. 3) the closest thing i see to a test of LimitTokenCountFilter is 
TestLimitTokenCountAnalyzer - i realize now the reason it's 
testLimitTokenCountAnalyzer doesn't get the same failure is because it's 
wrapping WhitespaceAnalyzer, StandardAnalyzer - should those be changed to use 
MockTokenizer?

I think we should always do this!
{quote}

B) testLimitTokenCountAnalyzer.testLimitTokenCountIndexWriter smells fishy to 
me, still discussing with rmuir...

{quote}
bq. 4) TestLimitTokenCountAnalyzer also has a testLimitTokenCountIndexWriter 
that uses MockAnalyzer w/o calling setEnableChecks(false) which seems like it 
should trigger the same failure i got since it uses MockTokenizer, but in 
general that test looks suspicious, as it seems to add the exact number of 
tokens that the limit is configured for, and then asserts that the last token 
is in the index - but never actaully triggers the limiting logic since exactly 
the allowed umber of tokens are used.

Then thats fine, because when LimitTokenCountFilter consumes the whole stream, 
its a "good consumer". its only when it actually truncates that it breaks the 
tokenstream contract.
{quote}

My concern wasn't that i thought that test was fishy just because it didn't 
call setEnableChecks(false) -- my concern was that i don't see the point of the 
test at all as it's currently written, because it doesn't actually trigger the 
token limiting behavior (if it did it would _have_ to call 
setEnableChecks(false) ... so what exactly is it suppose to be testing?

C) going back to a comment rmuir made earlier in SOLR-3961...

bq. Thats because LimitTokenCountFilter violates the workflow of the standard 
tokenstream API ... it cuts off its consumer and calls end() even while that 
consumer still has more tokens: this could easily cause unexpected side effects 
and bugs. ... But we knew about this anyway

This makes me wonder if it would be worth while to add an option to  
LimitTokenCountFilter to make it always consume all the tokens from the stream 
it wraps -- it could be off by default so it would be havle exactly like today, 
but if you are wrapping a TokenStream where it's really important to you that 
every token is consumed before end() is called, you could set it appropriately.

would that make sense?
                
> improve LimitTokenCountFilter and/or it's tests
> -----------------------------------------------
>
>                 Key: LUCENE-4489
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4489
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> spinning off a discussion about LimitTokenCountFilter  and it's tests from 
> SOLR-3961 (which was about a specific bug in the LimitTokenCountFilterFactory)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4489) improve LimitTokenCountFilter and/or it's tests

Reply via email to