Re: StandardTokenizer, maxTokenLength behavior — likely bug

[email protected] Mon, 26 Jan 2015 08:37:36 -0800

Thanks Steve.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jan 26, 2015 at 11:22 AM, Steve Rowe <[email protected]> wrote:

> The behavior changed in https://issues.apache.org/jira/browse/LUCENE-5897
> / https://issues.apache.org/jira/browse/LUCENE-5400
>
> On Mon, Jan 26, 2015 at 11:17 AM, [email protected] <
> [email protected]> wrote:
>
>> On one of my other open-source projects (SolrTextTagger) I have a test
>> that deliberately tests the effect of a very long token with the
>> StandardTokenizer, and that project is in turn tested against a wide matrix
>> of Lucene/Solr versions.  Before Lucene 4.9, if you had a token that
>> exceeded maxTokenLength (by default the max is 255), this created a skipped
>> position — basically a pseudo-stop-word.  Since 4.9, this doesn’t happen
>> anymore; the JFlex scanner thing never reports a token > 255.  I checked
>> our code coverage and sure enough the “skippedPositions++” never happens:
>>
>>
>> https://builds.apache.org/job/Lucene-Solr-Clover-trunk/lastSuccessfulBuild/clover-report/org/apache/lucene/analysis/standard/StandardTokenizer.html?line=167#src-167
>>
>> Any thoughts on this?  Steve?
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>
>

Re: StandardTokenizer, maxTokenLength behavior — likely bug

Reply via email to