[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845788#action_12845788 ]
Michael McCandless commented on LUCENE-2098: -------------------------------------------- Ahh ok. Probably we should switch to parallel arrays here, to make it very fast... yes this will consume RAM (8 bytes per position, if we keep all of them). Really most apps do not need all positions stored, ie, they only need to see typically the current token. So maybe we could make a filter that takes a "lookbehind size" and it'd only keep that number of mappings cached? That'd have to be > the max size of any token you may analyze, so hard to bound perfectly, but eg setting this to the max allowed token in IndexWriter would guarantee that we'd never have a miss? For analyzers that buffer tokens... they'd have to set this max to infinity, or, ensure they remap the offsets before capturing the token's state? > make BaseCharFilter more efficient in performance > ------------------------------------------------- > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 3.1 > Reporter: Koji Sekiguchi > Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org