I looked into this a while ago. Assuming that I remember correctly, the
conclusion I came to was that Horizontal Bit-Parallel (HBP) might be
helpful, but the vertical option was probably not appropriate.

HBP would allow Parquet readers to run predicates on multiple values at
once without needing to use SIMD instructions that aren't available to JVM
processes. (With SIMD instructions, you get even more value.) That would be
useful, but I think we'd have to change the bit packing encoding to lay out
values with the extra padding bit where predicate evaluation results end
up, because the benefit is only worth the work to reorder and pack if it is
reused.

For Vertical Bit-Parallel (VBP), I think the reason why I didn't think it
would be useful for Parquet is that it is really expensive to produce and
really expensive to reconstruct values that aren't filtered out. When
reconstructing more than just a few rows, as you would for large scans, it
would be much more expensive.

On Sun, Oct 14, 2018 at 1:26 PM Jim Apple <jbap...@apache.org> wrote:

> On 2018/10/08 22:08:16, Julien Le Dem <julien.le...@wework.com.INVALID>
> wrote:
> > it's a variation of bit packing. right?
>
> I looked into it on
> https://github.com/apache/parquet-format/blob/master/Encodings.md and I
> believe that the Horizontal Bit-Parallel encoding in the paper is a variant
> on bit packing. There are three changes:
>
> 1. No code is split between words
> 2. Every code gets a padding bit
> 3. The order of the packing is not linear; code 1 is not packed in a word
> with code 2.
>
> The paper obviously has much more detail. :-)
>
> The various vertical encodings referenced in the paper (bit-slicing,
> vertical bit-parallel, and BitWeaving/V) look further afield from Parquet's
> bit packing.
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to