[Cassandra Wiki] Update of "FileFormatDesignDoc" by Stu Hood

Apache Wiki Sat, 01 Jan 2011 18:34:09 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "FileFormatDesignDoc" page has been changed by StuHood.
http://wiki.apache.org/cassandra/FileFormatDesignDoc?action=diff&rev1=8&rev2=9

--------------------------------------------------

  
  Because we will be storing multiple columns per SSTable, our design will bear 
the most similarity to RCFile (rather than the column-per-file approach taken 
in Dremel). But because we allow for nesting via super columns (and hopefully 
with a more flexible representation in the future), we need to take hints from 
Dremel's serialization to allow for efficient storage of parent and null 
information.
  
+ === Vertical chunks ===
+ 
- Rather than dividing the table representation into chunks using horizontal 
slices, we will use vertical chunks (and horizontal chunks when necessary for 
particularly wide rows):
+ Rather than slicing the span into chunks horizontally, we will use vertical 
chunks (and horizontal chunks only when necessary for particularly wide rows):
  
  || ''row key'' ||
  || cheese  ||
@@ -83, +85 @@

  || 4.9 ||
  || china ||
  
- This representation achieves the benefits for compression shown in the RCFile 
paper: similar values are always stored together. But we have lost some 
information!: Using the tables above, it is impossible to determine which 
fields at level "name1" are cheeses, and which are fruits. We need to store 
parent information, and one method comes from Dremel's clever representation 
for arbitrary nesting. We add a single boolean to each tuple that toggles to 
represent parent changes:
+ This representation achieves some of the benefits for compression shown in 
the RCFile paper: similar values are stored much closer together. But we have 
lost some information!: Using the tables above, it is impossible to determine 
which fields at level "name1" are cheeses, and which are fruits.
+ 
+ === Parent representation ===
+ 
+ We need to store parent information, and one method comes from Dremel's 
clever representation for arbitrary nesting. We add a single boolean to each 
tuple that toggles to represent parent changes:
  
  || ''row key'' || ''parent_change'' ||
  || cheese  || 0 ||
@@ -116, +122 @@

  
  The parent change flag can be represented compactly using a bitmap, and field 
lengths can be packed tightly into group-varint encoded arrays [3], as alluded 
to in the Dremel paper, and mentioned in Jeff Dean's talks.
  
+ === Metadata ===
+ 
- Cassandra also needs to encode metadata about tuples and ranges of tuples, in 
order to represent creation and deletion timestamps: range tuples can be 
encoded in a similar fashion to the value tuples represented here, and the 
metadata itself can also be group-varint encoded.
+ Cassandra also needs to encode metadata about tuples and ranges of tuples, in 
order to represent creation and deletion timestamps: range tuples can be 
encoded in a similar fashion to the value tuples represented here, and the 
metadata timestamps can be group-varint encoded.
+ 
+ === Field reordering ===
+ 
+ One weakness of the implementation so far is that it doesn't allow tuples to 
be reordered within a level. This approach performs well for wide rows with 
high field cardinality, since adding compression is unlikely to remove data.
+ 
+ Since we have domain knowledge that a compression algorithm would not, it 
will often be more efficient to perform reordering by ourselves, particularly 
when a chunk has low cardinality: for example at the "name2" level above. By 
assigning the chunk an ''ordered'' type, we can store the fields in sorted 
order (rather than in parent-sorted order) and remove duplicates.
+ 
+ || ''name2'' ||
+ || flavor ||
+ || origin  ||
+ 
+ More importantly, a chunk of type ''ordered'' should influence the order of 
tuples in child chunks. When we encounter an ''ordered'' chunk at level 
"name2", we should expect its children in level "value" to be arranged as 
follows:
+ 
+ || ''value'' || ''parent_change'' ||
+ || 3.4 || 1 ||
+ || 5.6 || 1 ||
+ || 2.6 || 1 ||
+ || 4.2 || 1 ||
+ || 4.9 || 1 ||
+ || || 0 ||
+ || france || 1 ||
+ || || 0 ||
+ || || 0 ||
+ || china || 1 ||
+ 
+ The ''parent_change'' field is now a bitmap representing nulls: it indicates 
that all parents have a 'flavor' tuple, but only the second and fifth parents 
have an 'origin' tuple. This representation is ripe for compression.
  
  == Schema ==

[Cassandra Wiki] Update of "FileFormatDesignDoc" by Stu Hood

Reply via email to