One more comment and then I'll really shut up, I promise. On re-reading the paper, you are all absolutely correct about C-Store, H-Store and Vertica.
What is not in the paper and part of what he presented this week was applying column oriented stores to the TPC-H benchmark. The TPC-H OLTP telco benchmark has a schema of 212 columns, contains ~600GB data and each transaction accesses only 6 or 7 of the columns. In a full table scan, a row oriented store must read all 600GB of data. It has no choice. A column oriented store need only read the 6-7 columns which is approximately 20GB. I don't think anyone will argue that you can read 20GB a whole lot faster than 600GB. Jeff Hammerbacher wrote: > 4) your section on "adding capacity" has NOTHING at all to do > with organizing your data on disk in a column-oriented fashion; > it's a property of any reasonably well-designed horizontally > partitioned data store. Hmm, well column oriented-ness of BigTable and HBase do a pretty nice job of horizontal partitioning. Jonathan Hendler wrote: > One of the valid points ... has to do with compression (and null > values). For example - does HBase also offer tools, or a > strategy for compression? Yes, see hbase.HColumnDescriptor.java compression is controlled on a per column family basis. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
