I've been working on some things that I'm hoping to get into Lucy
sometime this year.   Daniel Lemire published a paper last year about
fast integer compression/decompression:

http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/

I've been working with him on doing it even faster.  I think we're on
target for something twice as fast as the last paper.    We're also
working on fast intersections of posting lists, and should have good
numbers there too.   I'd like to use Lucy as the test case for the
code to show some real world application in the paper.

Mike McCandless just wrote a blog post on speeding up Lucene with some
C++ code:
http://blog.mikemccandless.com/2013/06/screaming-fast-lucene-searches-using-c.html

I'm not sure how we compare now, but I think a good target might be to
beat those numbers by a lot.  Or failing that, at least trounce the
normal Java version, and proudly proclaim it.

I think that Nick is right that the Perl-centric view of Lucy isn't
helping us.  Lucy is designed to be sleek and fast, things Perl
decidedly is not.  For some reason, an image of an adult riding one of
those little tiny toy motorcycles comes to mind:
http://www.sfgate.com/bayarea/article/CALIFORNIA-The-buzz-is-all-about-2713912.php

I think we (or maybe just I) need to change this image to something
more balanced between rider and machine:
http://1.bp.blogspot.com/_eZW-obcG6Pc/TKyPS1eTOCI/AAAAAAAACU4/3l3LbtIlMqI/s1600/rossi+racing.jpg

What would we need to do to post great numbers on this same Wikipedia
benchmark?  Where do we come out now?  Do others think it's worthwhile
to try to improve our reputation in this direction?

--nate

Reply via email to