On Saturday 04 August 2007 09:36, eks dev wrote:
> Would it be possible somehow to make skip list (postings) RLE compressed 
without affecting performance in cases where RLE cannot identify longer runs?

I suppose you mean run length encoding? (I missed the first posting about 
this.)

> 
> we have an unusual (?) case where we have an opportunity to sort documents 
on category field before indexing. this order gets slightly disturbed during 
updates on index, but they normally they stay mostly sorted. Also, we noticed 
that there are longer runs of doc IDs even in standard case for hi frequency 
tokens due to some sort of locality (eg. web pages from one web site tend to 
have a lot of tokens in common)...
> 
> having RLE compressed skip lists for our category fields would bring huge 
savings (less readVInts), but on the other side it requires at least one if() 
in tight loops in next() and skipTo() that can slow down sparse case.
> 
> so the questions would be, can one see benefits in standard case? is it 
doable at all without turning everything upside down? maybe already there, me 
being plain stupid here.... 

You could try and use a VInt flag value (how about 0xFF ?) to start an encoded 
run length encoded series of bits. For example 0xFF would be followed by the
next delta as a VInt, and by the run length as the next VInt.

You might also try and generalize the bytes of VInt to nibbles (half bytes).

Regards,
Paul Elschot

> 
> 
> 
> 
> 
> 
> 
> 
> 
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/ 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to