I might be missing something, but what's the point of ensuring cross-column
alignment, so long as you have the record _count_ per page, and ensure that
a single record doesn't split across pages? i.e. you need to be
"record-aligned" within a column, but not sure why you have to guarantee
that page "N" in column 1 has the same records as page "N" in column 2,
since you can still use the ordinal record indexes to skip over irrelevant
pages.

-Todd

On Thu, Nov 19, 2015 at 10:57 AM, Nong Li <[email protected]> wrote:

> I'd like to propose a change to the format spec to add metadata to indicate
> that pages
> in arow group are record aligned. In other words page N consists of the
> same records
> across all columns. The benefit of this would be to allow skipping at the
> page level.
>
> The change would add a single optional boolean at the row group metadata
> level and
> only supported with DataPageHeaderV2 (V1 doesn't have a counter for the
> number of
> records in a page, only number of values). This is compatible with existing
> readers
> which can simply ignore this.
>
> Background:
> We originally picked to have roughly fixed byte size pages to maximize
> storage density.
> Pages are defined as the unit of indivisible work (e.g. compression across
> the entire page
> or encoding that need to be bulk decoded). A smaller page size improves
> single row
> latency(i.e. a 1MB page means reading a single value requires decoding
> 1MB). A larger
> page size generally improves the efficiency of general purpose compression
> algorithms.
> Since the number of bytes per record varies quite a lot between columns, it
> is not possible
> to have record aligned pages that are roughly the same size.
>
> This change would mean the more compressible columns are now smaller (in
> terms of
> bytes in a page) and might not compress as well using general purpose
> compression. As
> these are already small and compressed, the value of general purpose
> compression is low
> and the cost to overall storage footprint is small.
>
> The benefit would be to allow skipping pages using the page level
> statistics which can speed
> up filtering quite a lot.
>
> A ballpark for the number of values is 8K, which results in roughly the
> same page size for
> values that are 8 bytes per row.
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to