[ https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008169#comment-13008169 ]
Edward Capriolo commented on CASSANDRA-2319: -------------------------------------------- Is the plan to let users chose type on a per CF basis? If so by version 0.8.0 will a user chose between standard | compressed | wide when creating a CF. I think the use case and description makes sense however {quote} The index will grow proportionally with the total number of columns, not with the number of keys. {quote} This worries me. We tend to do getSlice for entire keys. So having these large indexes, does not bring this use case much. {quote} For narrow rows, this change would have no effect, as they will not reach the threshold for indexing anyway. {quote} What is the definition of a narrow row? > Promote row index > ----------------- > > Key: CASSANDRA-2319 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2319 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Stu Hood > Assignee: Stu Hood > Labels: index, timeseries > Fix For: 0.8 > > > The row index contains entries for configurably sized blocks of a wide row. > For a row of appreciable size, the row index ends up directing the third seek > (1. index, 2. row index, 3. content) to nearby the first column of a scan. > Since the row index is always used for wide rows, and since it contains > information that tells us whether or not the 3rd seek is necessary (the > column range or name we are trying to slice may not exist in a given > sstable), promoting the row index into the sstable index would allow us to > drop the maximum number of seeks for wide rows back to 2, and, more > importantly, would allow sstables to be eliminated using only the index. > An example usecase that benefits greatly from this change is time series data > in wide rows, where data is appended to the beginning or end of the row. Our > existing compaction strategy gets lucky and clusters the oldest data in the > oldest sstables: for queries to recently appended data, we would be able to > eliminate wide rows using only the sstable index, rather than needing to seek > into the data file to determine that it isn't interesting. For narrow rows, > this change would have no effect, as they will not reach the threshold for > indexing anyway. > A first cut design for this change would look very similar to the file format > design proposed on #674: > http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, > column names clustered, and offsets clustered and delta encoded. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira