Re: strange problem of PForDelta decoder

Michael McCandless Thu, 16 Dec 2010 03:17:54 -0800

On the bulkpostings branch you can do something like this:

  CodecProvider cp = new CodecProvider();
  cp.register(new PatchedFrameOfRefCodec());
  cp.setDefaultFieldCodec("PatchedFrameOfRef");


Then whenever you create an IW or IR, use the advanced method that
accepts a CodecProvider.  Then the index will always use PForDelta to
write/read.

I suspect conjunction queries got faster because we no longer skip if
the docID we seek is already in the current buffer (currently sized
64).  Ie, skip is very costly when the target isn't far.  This was
sort of an accidental byproduct of forcing even conjunction queries
using Standard (vInt) codec to work on block buffers, but I think it's
an important opto that we should more generally apply.

Skipping for block codecs and Standard/vInt are done w/ the same class
now.  It's just that the block codec must store the long filePointer
where the block starts *and* the int offset into the block, vs
Standard codec that just stores a filePointer.

On "how do we implement bulk read" this is the core change on the
bulkpostings branch -- we have a new API to separately bulk-read
docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
improving this (and getting to a clean PFor/For impl) now...

Mike

On Thu, Dec 16, 2010 at 4:29 AM, Li Li <fancye...@gmail.com> wrote:
> hi Michael,
>   lucene 4 has so much changes that I don't know how to index and
> search with specified codec. could you please give me some code
> snipplets that using PFor codec so I can trace the codes.
>   in you blog 
> http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
>   you said "The AND query, curiously, got faster; I think this is
> because I modified its scoring to first try seeking within the block
> of decoded ints".
>   I am also curious about the result because VINT only need decode
> part of the doclist while PFor need decode the whole block. But I
> think with conjuction queries, the main time is used for searching in
> skiplist. I haven't read your codes yet. But I guess the skiplist for
> VINT and the skiplist for PFor is different.
>   e.g.   lucene 2.9's default skipInterval is 16, so it like
>   level 1                                               256
>   level 0  16 32 48 64 80 96 112 128 ...   256
>   when need skipTo(60) we need read 0 16 32 48 64 in level0
>   but when use block, e.g. block size is 128, my implementation of skiplist is
>   level 1       256
>   level 0 128 256
>   when skipTo(60) we only read 2 item in level0 and decode the first
> block which contains 128 docIDs
>
>   How do you implement bulk read?
>   I did like this: I decode a block and cache it in a int array. I
> think I can buffer up to 100K docIDs and tfs for disjuction queries(it
> cost less than 1MB memory for each term)
>   SegmentTermDocs.read(final int[] docs, final int[] freqs)
>           ...........
>                        while (i < length && count < df) {
>                                if (curBlockIdx >= curBlockSize) { //this 
> condition is often
> false, we may optimize it. but JVM hotspots will cache "hot" codes. So
> ...
>                                        int idBlockBytes = 
> freqStream.readVInt();
>                                        curBlockIdx = 0;
>                                        for (int k = 0; k < idBlockBytes; k++) 
> {
>                                                buffer[k] = 
> freqStream.readInt();
>                                        }
>
>                                        blockIds = 
> code.decode(buffer,idBlockBytes);
>                                        curBlockSize = blockIds.length;
>
>                                        int tfBlockBytes = 
> freqStream.readVInt();
>                                        for (int k = 0; k < tfBlockBytes; k++) 
> {
>                                                buffer[k] = 
> freqStream.readInt();
>                                        }
>                                        blockTfs = code.decode(buffer, 
> tfBlockBytes);
>                                        assert curBlockSize == decoded.length;
>
>                                }
>                                freq = blockTfs[curBlockIdx];
>                                doc += blockIds[curBlockIdx++];
>
>                                count++;
>
>                                if (deletedDocs == null || 
> !deletedDocs.get(doc)) {
>                                        docs[i] = doc;
>                                        freqs[i] = freq;
>                                        ++i;
>                                }
>                        }
>
>
>
> 2010/12/15 Michael McCandless <luc...@mikemccandless.com>:
>> Hi Li Li,
>>
>> That issue has such a big patch, and enough of us are now iterating on
>> it, that we cut a dedicated branch for it.
>>
>> But note that this branch is off of trunk (to be 4.0).
>>
>> You should be able to do this:
>>
>>  svn checkout 
>> https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>>
>> And then run things in there.  I just committed FOR/PFOR prototype
>> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
>> using those codecs by running "ant test
>> -Dtests.codec=PatchedFrameOfRef".
>>
>> Please post patches back if you improve things!  We need all the help
>> we can get :)
>>
>> Mike
>>
>> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fancye...@gmail.com> wrote:
>>> hi Michael
>>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>>    I am not familiar with patch. do I need download
>>> LUCENE-2723.patch(there are many patches after this name, do I need
>>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>>
>>>
>>> 2010/12/14 Michael McCandless <luc...@mikemccandless.com>:
>>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>>
>>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>>
>>>> I've also found that running -Xbatch produces more consistent results
>>>> from run to run, however, those results may not be as fast as running
>>>> w/o -Xbatch.
>>>>
>>>> Also, it's better to test on actual data (ie a Lucene index's
>>>> postings), and in the full context of searching, because then we get a
>>>> sense of what speedups a real app will see... micro-benching is nearly
>>>> impossible in Java since Hotspot acts very differently vs the "real"
>>>> test.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fancye...@gmail.com> wrote:
>>>>> Hi
>>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>>   I use the implementation in
>>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>>> decode small number of integer arrays.
>>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>>> as 10000, it's faster than s9 and vint.
>>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>>> faster
>>>>>   e.g.
>>>>>                ct.testNewPFDCodes(list);
>>>>>                ct.testPFor(list);
>>>>>                ct.testVInt(list);
>>>>>                ct.testS9(list);
>>>>>
>>>>> NewPFD decode: 3614705
>>>>> PForDelta decode: 17320
>>>>> VINT decode: 16483
>>>>> S9 decode: 19835
>>>>> when I call by the following sequence
>>>>>
>>>>>                ct.testPFor(list);
>>>>>                ct.testNewPFDCodes(list);
>>>>>                ct.testVInt(list);
>>>>>                ct.testS9(list);
>>>>>
>>>>> PForDelta decode: 3212140
>>>>> NewPFD decode: 19556
>>>>> VINT decode: 16762
>>>>> S9 decode: 16483
>>>>>
>>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

Reply via email to