[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656728#action_12656728 ]
Michael McCandless commented on LUCENE-1483: -------------------------------------------- OK I ran an initial test, though since the ord approach is a "bit" buggy we can't be sure how well to trust these results. I indexed first 2M docs from Wikipedia, into 101 segment index, then search for "text" (hits 97K results), sort by title, pulling best 100 hits. I do the search 1000 times in each round. Current trunk (best 107.1 searches/sec): {code} Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem XSearchWarm 0 1 1 0.0 93.64 463,373,760 1,029,046,272 XSearchWithSort_1000 - 0 - - 1 - - - 1000 - - 100.6 - - 9.94 - 463,373,760 1,029,046,272 XSearchWithSort_1000 1 1 1000 107.1 9.34 572,969,344 1,029,046,272 XSearchWithSort_1000 - 2 - - 1 - - - 1000 - - 105.5 - - 9.48 - 572,969,344 1,029,046,272 XSearchWithSort_1000 3 1 1000 106.2 9.41 587,068,928 1,029,046,272 {code} Patch STRING_ORD (best 102.0 searches/sec): {code} Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem XSearchWarm 0 1 1 0.5 2.16 384,153,600 1,029,046,272 XSearchWithSort_1000 - 0 - - 1 - - - 1000 - - - 94.1 - - 10.63 - 439,173,824 1,029,046,272 XSearchWithSort_1000 1 1 1000 100.7 9.93 439,173,824 1,029,046,272 XSearchWithSort_1000 - 2 - - 1 - - - 1000 - - 101.9 - - 9.81 - 573,822,208 1,029,046,272 XSearchWithSort_1000 3 1 1000 102.0 9.81 573,822,208 1,029,046,272 {code} Patch STRING_VAL (best 34.6 searches/sec): {code} XSearchWarm 0 1 1 0.4 2.24 368,201,088 1,029,046,272 XSearchWithSort_1000 - 0 - - 1 - - - 1000 - - - 34.6 - - 28.94 - 415,107,648 1,029,046,272 XSearchWithSort_1000 1 1 1000 33.9 29.54 415,107,648 1,029,046,272 XSearchWithSort_1000 - 2 - - 1 - - - 1000 - - - 33.9 - - 29.46 - 545,339,904 1,029,046,272 XSearchWithSort_1000 3 1 1000 34.0 29.40 545,339,904 1,029,046,272 {code} Notes: * Populating the field cache on trunk for MultiReader is fantastically costly (94 sec). The IO cache was already hot so this isn't IO latency. I think MultiTermEnum/Docs behaves badly for this use case (single unique term (title) per doc). We really need to switch to column-stride fields, not un-invert, for this. * For this case at least STRING_ORD is still quite a bit faster than STRING_VAL; however, it's still buggy. Maybe a smaller queue size (eg 10 or 20) would make them closer. * STRING_ORD is still a bit slower than trunk's sort; hopefully once tuned it'll be closer. I think we now need to fix the STRING_ORD bug & retest. > Change IndexSearcher to use MultiSearcher semantics for multiple subreaders > --------------------------------------------------------------------------- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.9 > Reporter: Mark Miller > Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch > > > FieldCache and Filters are forced down to a single segment reader, allowing > for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org