On Thu, Jun 20, 2013 at 1:08 AM, Nathan Kurz <[email protected]> wrote:
> I've been working on some things that I'm hoping to get into Lucy
> sometime this year. Daniel Lemire published a paper last year about
> fast integer compression/decompression:
>
> http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/
>
> I've been working with him on doing it even faster. I think we're on
> target for something twice as fast as the last paper. We're also
> working on fast intersections of posting lists, and should have good
> numbers there too. I'd like to use Lucy as the test case for the
> code to show some real world application in the paper.
Sounds exciting! Here's one possible way that you could do this:
1. Build arbitrary data structures using a custom DataWriter/DataReader pair.
2. Create a Query/Compiler/Matcher subclass trio which, instead of going
through PostingsReader, accesses your custom index files.
Rather than explain everything at once, I'll start off with a sample script
which generates minimal segment files and illustrates the use of
Lucy::Plan::Architecture to control index components. Try it like so:
perl custom_arch.pl INDEX_LOCATION
hexdump -C test_index/seg_1/cf.dat
I'll attempt to attach the script to this email, but I don't recall whether our
dev list strips attachments. If it doesn't survive, I'll follow up with
another post containing the sample code inlined in the message body.
Marvin Humphrey