[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845788#action_12845788
 ] 

Michael McCandless commented on LUCENE-2098:
--------------------------------------------

Ahh ok.

Probably we should switch to parallel arrays here, to make it very fast... yes 
this will consume RAM (8 bytes per position, if we keep all of them).

Really most apps do not need all positions stored, ie, they only need to see 
typically the current token.  So maybe we could make a filter that takes a 
"lookbehind size" and it'd only keep that number of mappings cached?  That'd 
have to be > the max size of any token you may analyze, so hard to bound 
perfectly, but eg setting this to the max allowed token in IndexWriter would 
guarantee that we'd never have a miss?

For analyzers that buffer tokens... they'd have to set this max to infinity, 
or, ensure they remap the offsets before capturing the token's state?

> make BaseCharFilter more efficient in performance
> -------------------------------------------------
>
>                 Key: LUCENE-2098
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2098
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.1
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>         Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to