Re: Format change proposal: Record aligned pages

Nong Li Thu, 19 Nov 2015 12:15:07 -0800

Inline.

On Thu, Nov 19, 2015 at 11:38 AM, Julien Le Dem <[email protected]> wrote:


> Thanks Nong for looking into this!
> Several comments:
> - I would think that this is already possible today without changing the
> format. With Page format v2 we have the row count per page so you know by
> looking at the row count if they are aligned or not. That said adding an
> optional field is backward compatible and would be fine.
>

That's true. The writer can just do this and reader can figure it out. I
think this is problematic
because it is kind of complex to implement. Furthermore, I prefer this in
the metadata to better
support filter push down. I'd argue parquet-mr and other libraries should
*not* implement row
by row predicate evaluation. The engine using the library can do it
probably more efficiently.
We should at least have a way to restrict filter only when it is efficient
(i.e. skipping row groups
and pages). Having it in the metadata allows better negotiation on which
filters are pushable.


> - The writer can be modified to explicitly align pages. Would we want this
> to be optional?
>

I'm not sure what that means. If the writer knows and chooses to do this,
it would set the
metdata.


> - I don't think we have implemented that yet but the decoders could have an
> efficient skip(nValues) methods (RLE or dictionary encoding for example
> would skip very efficiently). Of course if there are repeated fields you
> need to decode the repetition levels first to know how many values to skip.
> That would be less efficient than directly skipping more granular pages.
> But we may want to have both to mitigate tradeoffs between compression and
> skipping.
>
Yea, I think the writer can choose depending how storage conscientious they
are. Effecient
skip for some encodings is good but it still requires quite a bit more work
than skipping
entire pages.


> Julien
>
>
> On Thu, Nov 19, 2015 at 11:07 AM, Todd Lipcon <[email protected]> wrote:
>
> > I might be missing something, but what's the point of ensuring
> cross-column
> > alignment, so long as you have the record _count_ per page, and ensure
> that
> > a single record doesn't split across pages? i.e. you need to be
> > "record-aligned" within a column, but not sure why you have to guarantee
> > that page "N" in column 1 has the same records as page "N" in column 2,
> > since you can still use the ordinal record indexes to skip over
> irrelevant
> > pages.
>
There is no ordinal index and even if there was I'm not sure how efficient
it would be for
this case.The use case here is not single row lookups but to be able to
take advantage of
skipping using the column stats.

>
> > -Todd
> >
> > On Thu, Nov 19, 2015 at 10:57 AM, Nong Li <[email protected]> wrote:
> >
> > > I'd like to propose a change to the format spec to add metadata to
> > indicate
> > > that pages
> > > in arow group are record aligned. In other words page N consists of the
> > > same records
> > > across all columns. The benefit of this would be to allow skipping at
> the
> > > page level.
> > >
> > > The change would add a single optional boolean at the row group
> metadata
> > > level and
> > > only supported with DataPageHeaderV2 (V1 doesn't have a counter for the
> > > number of
> > > records in a page, only number of values). This is compatible with
> > existing
> > > readers
> > > which can simply ignore this.
> > >
> > > Background:
> > > We originally picked to have roughly fixed byte size pages to maximize
> > > storage density.
> > > Pages are defined as the unit of indivisible work (e.g. compression
> > across
> > > the entire page
> > > or encoding that need to be bulk decoded). A smaller page size improves
> > > single row
> > > latency(i.e. a 1MB page means reading a single value requires decoding
> > > 1MB). A larger
> > > page size generally improves the efficiency of general purpose
> > compression
> > > algorithms.
> > > Since the number of bytes per record varies quite a lot between
> columns,
> > it
> > > is not possible
> > > to have record aligned pages that are roughly the same size.
> > >
> > > This change would mean the more compressible columns are now smaller
> (in
> > > terms of
> > > bytes in a page) and might not compress as well using general purpose
> > > compression. As
> > > these are already small and compressed, the value of general purpose
> > > compression is low
> > > and the cost to overall storage footprint is small.
> > >
> > > The benefit would be to allow skipping pages using the page level
> > > statistics which can speed
> > > up filtering quite a lot.
> > >
> > > A ballpark for the number of values is 8K, which results in roughly the
> > > same page size for
> > > values that are 8 bytes per row.
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>
> --
> Julien
>

Re: Format change proposal: Record aligned pages

Reply via email to