[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797025#action_12797025
 ] 

Stu Hood commented on CASSANDRA-674:
------------------------------------

List of features stubbed as "FIXME: not implemented" in v1:
 1. Reverse slicing within CFs is not implemented (see SSTableSliceIterator),
 2. Reading SuperColumns is disabled (see SSTable(Slice|Names)Iterator),
 3. The recently added MMAP support for data files is disabled until I can port 
this SSTableScanner interface to use it (see SSTableReader),
 4. AntiEntropyService is not hashing slices (meaning that major compactions 
always fail).
 5. SSTable(Import|Export) are broken,
 6. BinaryMemtables will crash on flush,
 7. The bytesRead MBean for CompactionManager is disabled, 
 8. AntiCompaction is not using the 'skip ranges we don`t need' optimization.

Also, I lied in the description above: the patch does not have GZIP compression 
enabled, but you can add two lines to enable it: add a GZIPInputStream to the 
chain in SSTableReader.Block.stream(), and a GZIPOutputStream to the chain in 
SSTableWriter.BlockContext.flushSlice(). There is a memory leak related to 
reading from compressed blocks which will quickly kill the server, but it 
should be easy to track down.

Finally, there are tons of other TODOs/FIXMEs scattered around, many of which 
should be tackled in other tickets.

> New SSTable Format
> ------------------
>
>                 Key: CASSANDRA-674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-674
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.9
>
>         Attachments: 674-v1.diff
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to