[jira] [Issue Comment Edited] (CASSANDRA-674) New SSTable Format

Stu Hood (JIRA) Sun, 22 May 2011 20:01:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037701#comment-13037701
 ]


Stu Hood edited comment on CASSANDRA-674 at 5/23/11 2:59 AM:
-------------------------------------------------------------

Attaching a new version: v3. I've extracted most of the tasks that can be 
accomplished independently into [other 
tickets|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=issue+in+%28CASSANDRA-2650%2C+CASSANDRA-2679%2C+CASSANDRA-2641%2C+CASSANDRA-2576%2C+CASSANDRA-2062%2C+CASSANDRA-2629%2C+CASSANDRA-2145%2C+CASSANDRA-2398%29+order+by+updated+desc].

Changes from v2:
* Removed Avro
* Added block checksumming
* Switched to type specific compression via CASSANDRA-2398
* Used type-specific compression for timestamps
* Implemented supercolumn support

This revision compresses wide rows very well, and the datafile format is 
essentially finalized. Next steps are to improve the performance at read time:
# Incorporate CASSANDRA-2319 to improve wide row access times
** Since the patch removes the column index, reads always begin at the 
beginning of the row, and scan until the correct column range is found. 2319 
would allow for random access to a block
# Store more than one row per block in order to take the best advantage of 
compression for narrow rows
** This patch adds a Cursor object, which represents the position in a block 
and file. SSTableScanner will need to hold a Cursor between rows, and pass it 
into each IColumnIterator that is created

      was (Author: stuhood):
    Attaching a new version: v3. I've extracted most of the tasks that can be 
accomplished independently into [other 
tickets|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=issue+in+%28CASSANDRA-2650%2C+CASSANDRA-2679%2C+CASSANDRA-2641%2C+CASSANDRA-2576%2C+CASSANDRA-2062%2C+CASSANDRA-2629%2C+CASSANDRA-2145%2C+CASSANDRA-2398%29+order+by+updated+desc].

Changes from v2:
* Removed Avro
* Switched to type specific compression via CASSANDRA-2398
* Used type-specific compression for timestamps
* Implemented supercolumn support

This revision compresses wide rows very well, and the datafile format is 
essentially finalized. Next steps are to improve the performance at read time:
# Incorporate CASSANDRA-2319 to improve wide row access times
** Since the patch removes the column index, reads always begin at the 
beginning of the row, and scan until the correct column range is found. 2319 
would allow for random access to a block
# Store more than one row per block in order to take the best advantage of 
compression for narrow rows
** This patch adds a Cursor object, which represents the position in a block 
and file. SSTableScanner will need to hold a Cursor between rows, and pass it 
into each IColumnIterator that is created
  
> New SSTable Format
> ------------------
>
>                 Key: CASSANDRA-674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-674
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 1.0
>
>         Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-674) New SSTable Format

Reply via email to