[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1410: --------------------------------------- Attachment: TermQueryTests.tgz In order to understand how time is spent overall during searching, I took TermQuery and reimplemented it in several different ways. While each implementatation is correct (checksum of final top N docs matches) they are very much prototypes and nowhere near committable. I then took the 100 most frequent terms (total 23.3 million hits) from my Wikipedia index and ran them each in sequence. Each result is best of 25 runs (all results are from the OS's IO cache): ||Test||Time (msec)||Hits/msec||Speedup|| |Baseline|674|34496|1.00X| |+ Code Speedups|591|39340|1.14X| |+ Code Speedups + PFOR|295|78814|2.28X| |+ Code Speedups + BITS|247|94130|2.73X| |+ Code Speedups + BITS (native)|230|101088|2.93X| Here's what the test names mean: * Baseline is the normal TermQuery, searching with TopDocsCollector for top 10 docs. * Code Speedups means some basic optimizations, eg made my own specialized priority queue, unrolled loops, etc. * PFOR means switching to PFOR for storing docs & freqs as separate streams. Each term's posting starts a new PFOR block. * BITS just means using packed n-bit ints for each block (ie, it has no exceptions, so it sets N so that all ints will fit). The resulting frq file was 18% bigger and doc file was 10% bigger -- but this is just for the 100 most frequent terms. * BITS (native) is BITS but running as JNI (C++) code. Next, I tried running the same things above, but I turned off collection of hits. So this really just tests decode time: ||Test||Time (msec)||Hits/msec||Speedup|| |+ Code Speedups|384|60547|1.76X| |+ Code Speedups + PFOR|91|255497|7.41X| |+ Code Speedups + BITS|49|474496|13.76X| |+ Code Speedups + BITS (native)|32|726572|21.06X| Some observations: * PFOR really does speed up TermQuery overall, so, I think we should pursue it and get it committed. * BITS is a good speedup beyond PFOR, but we haven't optimized PFOR yet. Also, BITS would be very seekable. We could also use PFOR but increase the bit size of blocks, to get the same thing. * Once we swap in PFOR and/or BITS or other interesting int block compression, accumulating hits becomes the slowest part of TermQuery. * Native BITS decoding code is quite a bit faster for decoding, but, probably not worth pursuing for now since with PFOR decode time becomes a small part of the overall time. * TermQuery is the simplest query; other queries will spend more CPU time coordinating, or time handling positions, so we can't yet conclude how much of an impact PFOR will have on them. > PFOR implementation > ------------------- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other > Reporter: Paul Elschot > Priority: Minor > Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, > TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]