[ 
https://issues.apache.org/jira/browse/LUCENE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858306#action_12858306
 ] 

Robert Muir commented on LUCENE-2400:
-------------------------------------

bq. Unfortunately, these changes cause a roughly 22% slowdown - 
contrib/benchmark numbers for the shingle alg (I got similar numbers for Java 
1.5):

Steven, i wonder if this is because of a stupid thing, I noticed this in your 
patch:
{noformat}
-      shingleBuilder.append(termAtt.termBuffer(), 0, termAtt.termLength());
+      gramBuilder.append(charTermAtt.toString());
{noformat}

i would recommend gramBuilder.append(termAtt.buffer(), 0, termAtt.length()) 
like before, maybe its just the extra gc cost of creating useless strings?

> ShingleFilter: don't output all-filler shingles/unigrams; also, convert from 
> TermAttribute to CharTermAttribute
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2400
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2400
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 3.0.1
>            Reporter: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-2400.patch
>
>
> When the input token stream to ShingleFilter has position increments greater 
> than one, filler tokens are inserted for each position for which there is no 
> token in the input token stream.  As a result, unigrams (if configured) and 
> shingles can be filler-only.  Filler-only output tokens make no sense - these 
> should be removed.
> Also, because TermAttribute has been deprecated in favor of 
> CharTermAttribute, the patch will also convert TermAttribute usages to 
> CharTermAttribute in ShingleFilter.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to