[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781045#action_12781045 ]
Robert Muir commented on LUCENE-2075: ------------------------------------- Hi, I applied automaton patch and its benchmark (LUCENE-1606) against the flex branch, and kept with the old TermEnum api. I tested two scenarios, an old index created with 3.0 (trunk) and a new index created with flex branch. in both cases, its slower than trunk, but I assume this is due to flex branch not being optimized yet?... (last i saw it used new String() placeholder for utf conversion) but i think it is fair to compare the flex branch with itself, with old idx versus new idx. I can only assume with a new idx it is using the caching. these numbers are stable on HEAD and do not deviate much. feel free to look at the benchmark code over there and suggest improvements if you think there is an issue with it. ||Pattern||Iter||AvgHits||AvgMS (old idx)||AvgMS (new idx)|| |N?N?N?N|10|1000.0|86.6|70.2| |?NNNNNN|10|10.0|3.0|2.0| |??NNNNN|10|100.0|12.5|7.2| |???NNNN|10|1000.0|86.9|34.8| |????NNN|10|10000.0|721.2|530.5| |NN??NNN|10|100.0|8.3|4.0| |NN?N*|10|10000.0|149.1|143.2| |?NN*|10|100000.0|1061.4|836.7| |*N|10|1000000.0|16329.7|11480.0| |NNNNN??|10|100.0|2.7|2.2| > Share the Term -> TermInfo cache across threads > ----------------------------------------------- > > Key: LUCENE-2075 > URL: https://issues.apache.org/jira/browse/LUCENE-2075 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, > LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, > LUCENE-2075.patch > > > Right now each thread creates its own (thread private) SimpleLRUCache, > holding up to 1024 terms. > This is rather wasteful, since if there are a high number of threads > that come through Lucene, you're multiplying the RAM usage. You're > also cutting way back on likelihood of a cache hit (except the known > multiple times we lookup a term within-query, which uses one thread). > In NRT search we open new SegmentReaders (on tiny segments) often > which each thread must then spend CPU/RAM creating & populating. > Now that we are on 1.5 we can use java.util.concurrent.*, eg > ConcurrentHashMap. One simple approach could be a double-barrel LRU > cache, using 2 maps (primary, secondary). You check the cache by > first checking primary; if that's a miss, you check secondary and if > you get a hit you promote it to primary. Once primary is full you > clear secondary and swap them. > Or... any other suggested approach? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org