I might be missing something, but what's the point of ensuring cross-column alignment, so long as you have the record _count_ per page, and ensure that a single record doesn't split across pages? i.e. you need to be "record-aligned" within a column, but not sure why you have to guarantee that page "N" in column 1 has the same records as page "N" in column 2, since you can still use the ordinal record indexes to skip over irrelevant pages.
-Todd On Thu, Nov 19, 2015 at 10:57 AM, Nong Li <[email protected]> wrote: > I'd like to propose a change to the format spec to add metadata to indicate > that pages > in arow group are record aligned. In other words page N consists of the > same records > across all columns. The benefit of this would be to allow skipping at the > page level. > > The change would add a single optional boolean at the row group metadata > level and > only supported with DataPageHeaderV2 (V1 doesn't have a counter for the > number of > records in a page, only number of values). This is compatible with existing > readers > which can simply ignore this. > > Background: > We originally picked to have roughly fixed byte size pages to maximize > storage density. > Pages are defined as the unit of indivisible work (e.g. compression across > the entire page > or encoding that need to be bulk decoded). A smaller page size improves > single row > latency(i.e. a 1MB page means reading a single value requires decoding > 1MB). A larger > page size generally improves the efficiency of general purpose compression > algorithms. > Since the number of bytes per record varies quite a lot between columns, it > is not possible > to have record aligned pages that are roughly the same size. > > This change would mean the more compressible columns are now smaller (in > terms of > bytes in a page) and might not compress as well using general purpose > compression. As > these are already small and compressed, the value of general purpose > compression is low > and the cost to overall storage footprint is small. > > The benefit would be to allow skipping pages using the page level > statistics which can speed > up filtering quite a lot. > > A ballpark for the number of values is 8K, which results in roughly the > same page size for > values that are 8 bytes per row. > -- Todd Lipcon Software Engineer, Cloudera
