Re: HBase performance

Jason Watkins Fri, 12 Oct 2007 22:07:48 -0700

> - writes: a row oriented database writes the whole row regardless
>   of whether or not values are supplied for every field or not.
>   Space is reserved for null fields, so the number of bytes
>   written is the same for every row. In a column oriented
>   database, only the columns for which values are supplied are
>   written. Nulls are free. Also row oriented databases must write
>   a row descriptor so that when the row is read, the column values
>   can be found.


While I believe this is true for the basic N-Ary Storage Model as
published in the literature, I believe most practical products have
some mechanism of null compression within a page. Perhaps someone with
more experience could confirm if this is the case?

> - reads: Unless every column is being returned on a read, a column
>   oriented database is faster because it only reads the columns
>   requested. The row oriented database must read the entire row,
>   figure out where the requested columns are and only return that
>   portion of the data read.

Partly. This is ignoring that the column oriented store has to do
tuple reconstruction which also has overhead. As published in the
literature, a hybrid of rows across pages but with attributes
organized as columns within each page is better than a pure column
store in almost all workloads (reference PAX storage manager in the
literature).

All that said, I found his paper extremely interesting, particularly
the willingness to forgo disk altogether.

Jason

Re: HBase performance

Reply via email to