At Dremio, we use four main types of selection vector/bitmaps:

Dense Format (record valid or not, no ordering)
- single bit (bitmap)

Sparse formats (identifies valid records as well as their order)
- 2 byte (for record batches up to 2^16 records).
- 4 byte (for 2^16 batches of 2^16 records);
- 6 byte (for 2^32 batches of 2^16 records);

We've considered introducing a couple more. I imagine for other use cases,
where people use much larger batches of records, different requirements
would be necessary. My reason for sharing is it seems like this may be
use-case specific. I'd also note that at the IPC level, you'd generally
want to contract batches before dropping them on the wire (or at least that
is what we typically do).

On Fri, Jan 24, 2020 at 11:23 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I was thinking selection vector/bitmap (possibly with different encodings),
> but really nothing for now.  Ordinarily, I'd lean towards YAGNI but there
> isn't a good way to add this in easily in a forward compatible way unless
> we add a placeholder enum/table for 1.0 (the default option would be no
> filter and wouldn't change the packaged data at all).
>
> On Fri, Jan 24, 2020 at 4:55 AM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > By filter, you mean a filter expression, or a selection vector/bitmap?
> >
> > On Thu, Jan 23, 2020 at 11:38 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> > >
> > > One of the things that I think got overlooked in the conversation on
> > having
> > > a slice offset in the C API was a suggestion from Jacques of perhaps
> > > generalizing the concept to an arbitrary "filter" for arrays/record
> > batches.
> > >
> > > I believe this point was also discussed in the past as well.  I'm not
> > > advocating for adding it now but I'm curious if people feel we should
> add
> > > something to Schema.fbs for forward compatibility,  in case we wish to
> > > support this use-case in the future.
> > >
> > > Thanks,
> > > Micah
> >
>

Reply via email to