[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020192#comment-13020192
 ] 

Stu Hood commented on CASSANDRA-2319:
-------------------------------------

The attached patch depends on 2336 and 2398, and implements a compressed, 
promoted index (called NestedIndex) containing enough information to eliminate 
wide rows without seeking into the data file. Only some of the features from 
the issue description are actually implemented: the primary goal of this first 
version was to begin eliminating sstables for wide-row (time series) usecases.

The nested index contains nearly as much information as our current row header: 
the only thing missing at this point is the bloom filter (which I think needs 
more thought to get right, and wasn't critical for our slice usecase).

Narrow rows (rows < column_index_size) are represented in the index by their 
key, offset and two bits to indicate that neither metadata nor column names 
have been stored in the index. Rows wider than column_index_size will have an 
entry containing their metadata and one or more column entries, with a bit per 
column entry to indicate their ownership.

The key cache stores RowHeaders, which have a base implementation that is 
essentially just a boxed long offset (matching our existing impl in memory 
usage). Wide rows use the NestedRowHeader subclass, which additionally contains 
row metadata and the min and max columns in the row, which is enough to 
eliminate a row for slices. The intention is that as our file format migrates 
toward being block based, RowHeader will evolve into a BlockHeader describing a 
seekable point in the data file (and possibly cached in a sorted structure as 
described above).

Data in the index mostly uses the design from 
[[FileFormatDesignDoc|http://wiki.apache.org/cassandra/FileFormatDesignDoc]], 
and is compressed using either type specific compression or LZF (via 
CASSANDRA-2398). The end result is that for a narrow-row usecase and simple 
keys, the index is half the size of the existing implementation: complex keys 
will likely see larger benefits.

> Promote row index
> -----------------
>
>                 Key: CASSANDRA-2319
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>              Labels: index, timeseries
>             Fix For: 1.0
>
>         Attachments: 2319-v1.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to