On Thu, Jun 20, 2013 at 1:08 AM, Nathan Kurz <[email protected]> wrote:
> I've been working on some things that I'm hoping to get into Lucy
> sometime this year.   Daniel Lemire published a paper last year about
> fast integer compression/decompression:
>
> http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/
>
> I've been working with him on doing it even faster.  I think we're on
> target for something twice as fast as the last paper.    We're also
> working on fast intersections of posting lists, and should have good
> numbers there too.   I'd like to use Lucy as the test case for the
> code to show some real world application in the paper.

Sounds exciting!  Here's one possible way that you could do this:

1.  Build arbitrary data structures using a custom DataWriter/DataReader pair.
2.  Create a Query/Compiler/Matcher subclass trio which, instead of going
    through PostingsReader, accesses your custom index files.

Rather than explain everything at once, I'll start off with a sample script
which generates minimal segment files and illustrates the use of
Lucy::Plan::Architecture to control index components.  Try it like so:

    perl custom_arch.pl INDEX_LOCATION
    hexdump -C test_index/seg_1/cf.dat

I'll attempt to attach the script to this email, but I don't recall whether our
dev list strips attachments.  If it doesn't survive, I'll follow up with
another post containing the sample code inlined in the message body.

Marvin Humphrey

Reply via email to