Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "FileFormatDesignDoc" page has been changed by StuHood. http://wiki.apache.org/cassandra/FileFormatDesignDoc?action=diff&rev1=7&rev2=8 -------------------------------------------------- || cheese || gouda || flavor || 5.6 || || cheese || gouda || origin || france || || cheese || swiss || flavor || 2.6 || + || ''row key'' || ''name1'' || ''name2'' || ''value'' || || fruit || apple || flavor || 4.2 || || fruit || pear || flavor || 4.9 || || fruit || pear || origin || china || @@ -36, +37 @@ || || gouda || flavor || 5.6 || || || || origin || france || || || swiss || flavor || 2.6 || + || ''row key'' || ''name1'' || ''name2'' || ''value'' || || fruit || apple || flavor || 4.2 || || || pear || flavor || 4.9 || || || || origin || china || - The current implementation of SSTables lays data out on disk in approximately this way: data for rows is stored contiguously. In relation to the table representation above, we divide the tree into pieces using horizontal "chunks". One must seek to the "root" of the tree for a row in order to read the row index and determine which chunk the next level of the tree is stored in. + The current implementation of SSTables lays data out on disk in approximately this way: data for rows is stored contiguously. In relation to the table representation above, we divide the tree into pieces using horizontal "chunks". One must seek to the root of the tree for a row in order to read the row index and determine which chunk the next level of the tree is stored in. - Additionally, only the first level of the tree is indexed: in order to find a particular column at the level labeled "name2", you would need to deserialize all columns at that level. + Additionally, only the first level of the tree is indexed: in order to find a particular column at the level labeled "name2", you would need to deserialize all columns at that level, which makes large super columns impractical. Finally, there is a second type of redundancy that the current design does not tackle: the column names at level "name2" are frequently repeated, but since rows are stored independently, we don't normalize those values. For narrow rows (like those shown), removing this redundancy will be our largest win. @@ -81, +83 @@ || 4.9 || || china || - This representation achieves the benefits for compression shown in the RCFile paper: similar values are always stored together. But we have lost some information!: Using the tables above, it is impossible to determine which fields at level "name1" are cheeses, and which are fruits. We need to store parent information, and our method should come from Dremel's clever representation for arbitrary nesting. We add a single boolean to each tuple that toggles to represent parent changes: + This representation achieves the benefits for compression shown in the RCFile paper: similar values are always stored together. But we have lost some information!: Using the tables above, it is impossible to determine which fields at level "name1" are cheeses, and which are fruits. We need to store parent information, and one method comes from Dremel's clever representation for arbitrary nesting. We add a single boolean to each tuple that toggles to represent parent changes: || ''row key'' || ''parent_change'' || || cheese || 0 ||
