Re: strange problem of PForDelta decoder

Michael McCandless Mon, 20 Dec 2010 02:22:42 -0800

It's autogen'd from the Python script in that dir (gendecompress),
but, we are actively experimenting with the numerous ways to feed it
data from the file, to see what the JVM can most efficiently execute.


For testing, we need better coverage here.  But I have an initial
"encode random ints" test that I'm about to commit to the bulkpostings
branch... the pfor1 impl passes it, but pfor2 doesn't yet (I think
maybe because Simple16 can't handle ints >= 2^28?).

Mike

On Sun, Dec 19, 2010 at 10:06 PM, Li Li <fancye...@gmail.com> wrote:
> is ForDecompressImpl generated by codes or manully coded?
> I am frustrated by
> http://code.google.com/p/integer-array-compress-kit/ which contains
> too many bugs( I fixed more than 20 but there still existed bugs)
> Because decoder has too many branches and in normal situation, some
> branches seldom  occurs .
>
> these decoder implemented in branch assume blockSize==128, it has less
> branches than decoder which support arbitrary blockSize.
> How do you test these decoder to ensure every branch is tested?
>
> 2010/12/16 Michael McCandless <luc...@mikemccandless.com>:
>> On the bulkpostings branch you can do something like this:
>>
>>  CodecProvider cp = new CodecProvider();
>>  cp.register(new PatchedFrameOfRefCodec());
>>  cp.setDefaultFieldCodec("PatchedFrameOfRef");
>>
>> Then whenever you create an IW or IR, use the advanced method that
>> accepts a CodecProvider.  Then the index will always use PForDelta to
>> write/read.
>>
>> I suspect conjunction queries got faster because we no longer skip if
>> the docID we seek is already in the current buffer (currently sized
>> 64).  Ie, skip is very costly when the target isn't far.  This was
>> sort of an accidental byproduct of forcing even conjunction queries
>> using Standard (vInt) codec to work on block buffers, but I think it's
>> an important opto that we should more generally apply.
>>
>> Skipping for block codecs and Standard/vInt are done w/ the same class
>> now.  It's just that the block codec must store the long filePointer
>> where the block starts *and* the int offset into the block, vs
>> Standard codec that just stores a filePointer.
>>
>> On "how do we implement bulk read" this is the core change on the
>> bulkpostings branch -- we have a new API to separately bulk-read
>> docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
>> improving this (and getting to a clean PFor/For impl) now...
>>
>> Mike
>>
>> On Thu, Dec 16, 2010 at 4:29 AM, Li Li <fancye...@gmail.com> wrote:
>>> hi Michael,
>>>   lucene 4 has so much changes that I don't know how to index and
>>> search with specified codec. could you please give me some code
>>> snipplets that using PFor codec so I can trace the codes.
>>>   in you blog 
>>> http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
>>>   you said "The AND query, curiously, got faster; I think this is
>>> because I modified its scoring to first try seeking within the block
>>> of decoded ints".
>>>   I am also curious about the result because VINT only need decode
>>> part of the doclist while PFor need decode the whole block. But I
>>> think with conjuction queries, the main time is used for searching in
>>> skiplist. I haven't read your codes yet. But I guess the skiplist for
>>> VINT and the skiplist for PFor is different.
>>>   e.g.   lucene 2.9's default skipInterval is 16, so it like
>>>   level 1                                               256
>>>   level 0  16 32 48 64 80 96 112 128 ...   256
>>>   when need skipTo(60) we need read 0 16 32 48 64 in level0
>>>   but when use block, e.g. block size is 128, my implementation of skiplist 
>>> is
>>>   level 1       256
>>>   level 0 128 256
>>>   when skipTo(60) we only read 2 item in level0 and decode the first
>>> block which contains 128 docIDs
>>>
>>>   How do you implement bulk read?
>>>   I did like this: I decode a block and cache it in a int array. I
>>> think I can buffer up to 100K docIDs and tfs for disjuction queries(it
>>> cost less than 1MB memory for each term)
>>>   SegmentTermDocs.read(final int[] docs, final int[] freqs)
>>>           ...........
>>>                        while (i < length && count < df) {
>>>                                if (curBlockIdx >= curBlockSize) { //this 
>>> condition is often
>>> false, we may optimize it. but JVM hotspots will cache "hot" codes. So
>>> ...
>>>                                        int idBlockBytes = 
>>> freqStream.readVInt();
>>>                                        curBlockIdx = 0;
>>>                                        for (int k = 0; k < idBlockBytes; 
>>> k++) {
>>>                                                buffer[k] = 
>>> freqStream.readInt();
>>>                                        }
>>>
>>>                                        blockIds = 
>>> code.decode(buffer,idBlockBytes);
>>>                                        curBlockSize = blockIds.length;
>>>
>>>                                        int tfBlockBytes = 
>>> freqStream.readVInt();
>>>                                        for (int k = 0; k < tfBlockBytes; 
>>> k++) {
>>>                                                buffer[k] = 
>>> freqStream.readInt();
>>>                                        }
>>>                                        blockTfs = code.decode(buffer, 
>>> tfBlockBytes);
>>>                                        assert curBlockSize == 
>>> decoded.length;
>>>
>>>                                }
>>>                                freq = blockTfs[curBlockIdx];
>>>                                doc += blockIds[curBlockIdx++];
>>>
>>>                                count++;
>>>
>>>                                if (deletedDocs == null || 
>>> !deletedDocs.get(doc)) {
>>>                                        docs[i] = doc;
>>>                                        freqs[i] = freq;
>>>                                        ++i;
>>>                                }
>>>                        }
>>>
>>>
>>>
>>> 2010/12/15 Michael McCandless <luc...@mikemccandless.com>:
>>>> Hi Li Li,
>>>>
>>>> That issue has such a big patch, and enough of us are now iterating on
>>>> it, that we cut a dedicated branch for it.
>>>>
>>>> But note that this branch is off of trunk (to be 4.0).
>>>>
>>>> You should be able to do this:
>>>>
>>>>  svn checkout 
>>>> https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>>>>
>>>> And then run things in there.  I just committed FOR/PFOR prototype
>>>> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
>>>> using those codecs by running "ant test
>>>> -Dtests.codec=PatchedFrameOfRef".
>>>>
>>>> Please post patches back if you improve things!  We need all the help
>>>> we can get :)
>>>>
>>>> Mike
>>>>
>>>> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fancye...@gmail.com> wrote:
>>>>> hi Michael
>>>>>    you posted a patch here 
>>>>> https://issues.apache.org/jira/browse/LUCENE-2723
>>>>>    I am not familiar with patch. do I need download
>>>>> LUCENE-2723.patch(there are many patches after this name, do I need
>>>>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>>>>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>>>>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>>>>
>>>>>
>>>>> 2010/12/14 Michael McCandless <luc...@mikemccandless.com>:
>>>>>> Likely you are seeing the startup cost of hotspot compiling the PFOR 
>>>>>> code?
>>>>>>
>>>>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>>>>
>>>>>> I've also found that running -Xbatch produces more consistent results
>>>>>> from run to run, however, those results may not be as fast as running
>>>>>> w/o -Xbatch.
>>>>>>
>>>>>> Also, it's better to test on actual data (ie a Lucene index's
>>>>>> postings), and in the full context of searching, because then we get a
>>>>>> sense of what speedups a real app will see... micro-benching is nearly
>>>>>> impossible in Java since Hotspot acts very differently vs the "real"
>>>>>> test.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fancye...@gmail.com> wrote:
>>>>>>> Hi
>>>>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a 
>>>>>>> problem.
>>>>>>>   I use the implementation in
>>>>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>>>>> decode small number of integer arrays.
>>>>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>>>>> as 10000, it's faster than s9 and vint.
>>>>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>>>>> faster
>>>>>>>   e.g.
>>>>>>>                ct.testNewPFDCodes(list);
>>>>>>>                ct.testPFor(list);
>>>>>>>                ct.testVInt(list);
>>>>>>>                ct.testS9(list);
>>>>>>>
>>>>>>> NewPFD decode: 3614705
>>>>>>> PForDelta decode: 17320
>>>>>>> VINT decode: 16483
>>>>>>> S9 decode: 19835
>>>>>>> when I call by the following sequence
>>>>>>>
>>>>>>>                ct.testPFor(list);
>>>>>>>                ct.testNewPFDCodes(list);
>>>>>>>                ct.testVInt(list);
>>>>>>>                ct.testS9(list);
>>>>>>>
>>>>>>> PForDelta decode: 3212140
>>>>>>> NewPFD decode: 19556
>>>>>>> VINT decode: 16762
>>>>>>> S9 decode: 16483
>>>>>>>
>>>>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

Reply via email to