I've been working on some things that I'm hoping to get into Lucy sometime this year. Daniel Lemire published a paper last year about fast integer compression/decompression:
http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/ I've been working with him on doing it even faster. I think we're on target for something twice as fast as the last paper. We're also working on fast intersections of posting lists, and should have good numbers there too. I'd like to use Lucy as the test case for the code to show some real world application in the paper. Mike McCandless just wrote a blog post on speeding up Lucene with some C++ code: http://blog.mikemccandless.com/2013/06/screaming-fast-lucene-searches-using-c.html I'm not sure how we compare now, but I think a good target might be to beat those numbers by a lot. Or failing that, at least trounce the normal Java version, and proudly proclaim it. I think that Nick is right that the Perl-centric view of Lucy isn't helping us. Lucy is designed to be sleek and fast, things Perl decidedly is not. For some reason, an image of an adult riding one of those little tiny toy motorcycles comes to mind: http://www.sfgate.com/bayarea/article/CALIFORNIA-The-buzz-is-all-about-2713912.php I think we (or maybe just I) need to change this image to something more balanced between rider and machine: http://1.bp.blogspot.com/_eZW-obcG6Pc/TKyPS1eTOCI/AAAAAAAACU4/3l3LbtIlMqI/s1600/rossi+racing.jpg What would we need to do to post great numbers on this same Wikipedia benchmark? Where do we come out now? Do others think it's worthwhile to try to improve our reputation in this direction? --nate
