One more comment and then I'll really shut up, I promise. On re-reading the 
paper, you are all absolutely correct about C-Store, H-Store and Vertica.

What is not in the paper and part of what he presented this week was applying 
column oriented stores to the TPC-H benchmark.

The TPC-H OLTP telco benchmark has a schema of 212 columns, contains ~600GB 
data and each transaction accesses only 6 or 7 of the columns. In a full table 
scan, a row oriented store must read all 600GB of data. It has no choice. A 
column oriented store need only read the 6-7 columns which is approximately 
20GB. I don't think anyone will argue that you can read 20GB a whole lot faster 
than 600GB.

Jeff Hammerbacher wrote:
> 4) your section on "adding capacity" has NOTHING at all to do
> with organizing your data on disk in a column-oriented fashion;
> it's a property of any reasonably well-designed horizontally
> partitioned data store.

Hmm, well column oriented-ness of BigTable and HBase do a pretty nice job of 
horizontal partitioning.

Jonathan Hendler wrote:
> One of the valid points ... has to do with compression (and null
> values).  For example - does HBase also offer tools, or a
> strategy for compression?

Yes, see hbase.HColumnDescriptor.java compression is controlled on a per column 
family basis.

---
Jim Kellerman, Senior Engineer; Powerset
[EMAIL PROTECTED]


Reply via email to