Thanks Nong for looking into this!
Several comments:
- I would think that this is already possible today without changing the
format. With Page format v2 we have the row count per page so you know by
looking at the row count if they are aligned or not. That said adding an
optional field is backward compatible and would be fine.
- The writer can be modified to explicitly align pages. Would we want this
to be optional?
- I don't think we have implemented that yet but the decoders could have an
efficient skip(nValues) methods (RLE or dictionary encoding for example
would skip very efficiently). Of course if there are repeated fields you
need to decode the repetition levels first to know how many values to skip.
That would be less efficient than directly skipping more granular pages.
But we may want to have both to mitigate tradeoffs between compression and
skipping.
Julien


On Thu, Nov 19, 2015 at 11:07 AM, Todd Lipcon <[email protected]> wrote:

> I might be missing something, but what's the point of ensuring cross-column
> alignment, so long as you have the record _count_ per page, and ensure that
> a single record doesn't split across pages? i.e. you need to be
> "record-aligned" within a column, but not sure why you have to guarantee
> that page "N" in column 1 has the same records as page "N" in column 2,
> since you can still use the ordinal record indexes to skip over irrelevant
> pages.
>
> -Todd
>
> On Thu, Nov 19, 2015 at 10:57 AM, Nong Li <[email protected]> wrote:
>
> > I'd like to propose a change to the format spec to add metadata to
> indicate
> > that pages
> > in arow group are record aligned. In other words page N consists of the
> > same records
> > across all columns. The benefit of this would be to allow skipping at the
> > page level.
> >
> > The change would add a single optional boolean at the row group metadata
> > level and
> > only supported with DataPageHeaderV2 (V1 doesn't have a counter for the
> > number of
> > records in a page, only number of values). This is compatible with
> existing
> > readers
> > which can simply ignore this.
> >
> > Background:
> > We originally picked to have roughly fixed byte size pages to maximize
> > storage density.
> > Pages are defined as the unit of indivisible work (e.g. compression
> across
> > the entire page
> > or encoding that need to be bulk decoded). A smaller page size improves
> > single row
> > latency(i.e. a 1MB page means reading a single value requires decoding
> > 1MB). A larger
> > page size generally improves the efficiency of general purpose
> compression
> > algorithms.
> > Since the number of bytes per record varies quite a lot between columns,
> it
> > is not possible
> > to have record aligned pages that are roughly the same size.
> >
> > This change would mean the more compressible columns are now smaller (in
> > terms of
> > bytes in a page) and might not compress as well using general purpose
> > compression. As
> > these are already small and compressed, the value of general purpose
> > compression is low
> > and the cost to overall storage footprint is small.
> >
> > The benefit would be to allow skipping pages using the page level
> > statistics which can speed
> > up filtering quite a lot.
> >
> > A ballpark for the number of values is 8K, which results in roughly the
> > same page size for
> > values that are 8 bytes per row.
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Julien

Reply via email to