[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-674:
-------------------------------

    Description: 
Various tickets exist due to limitations in the SSTable file format, including 
#16, #47 and #328. Attached is a proposed design/implementation of a new file 
format for SSTables that addresses a few of these limitations.

This v1 implementation is not ready for serious use: see comments for remaining 
issues. It is roughly the format described here: 
http://wiki.apache.org/cassandra/FileFormatDesignDoc 

  was:
Various tickets exist due to limitations in the SSTable file format, including 
#16, #47 and #328. Attached is a proposed design/implementation of a new file 
format for SSTables that addresses a few of these limitations. The 
implementation has a bunch of issues/fixmes, which I'll describe in the 
comments.

The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
class, but briefly:
 * Blocks are opaque (except for their header) so that they can be compressed. 
The index file contains an entry for the first key in every Block. Blocks 
contain Slices.
 * Slices are series of columns with the same parents and (deletion) metadata. 
They can be used to represent ColumnFamilies or SuperColumns (or a slice of 
columns at any other depth). A single CF can be split across multiple Slices, 
which can be split across multiple blocks.
 * Neither Slices nor Blocks have a fixed size or maximum length, but they each 
have target lengths which can be stretched and broken by very large columns.

The most interesting concepts from this patch are:
 * Block compression is possible (currently using GZIP, which has one bug 
mentioned in the comments),
 * Compaction involves merging intersecting Slices from input SSTables. Since 
large rows will be broken down into multiple slices, only the portions of rows 
that intersect between tables need to be deserialized/merged/held-in-memory,
 * Indexes for individual rows are gone, since the global index allows random 
access to the middle of column families that span Blocks, and Slices allow 
batches of columns to be skipped within a Block.
 * Bloom filters for individual rows are gone, and the global filter contains 
ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
row that does will often not need to seek to the row.
 * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) for 
columns are defined recursively, so deeply nested slices are possible,
 * Slices representing a single parent (CF, SC, etc) can have different 
Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
containing columns a-c and g-h. This allows for eventually consistent range 
deletes of columns.


Posting a new description; trusting JIRA to preserve the original for posterity.

> New SSTable Format
> ------------------
>
>                 Key: CASSANDRA-674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-674
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v1 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to