[
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224147#comment-13224147
]
Sylvain Lebresne commented on CASSANDRA-2319:
---------------------------------------------
I've put a version of this issue at
https://github.com/pcmanus/cassandra/commits/2319_index_promotion (against
current trunk). Contrarily to the previously attached patches, this doesn't
change the file format much. It pretty literally do what the issue title said:
it promotes the columns index from the data file to the index file. Note that
the patch is split in 3 commits that have some form of logical separation but
the code only compile with all 3 commits.
So this remove the column index and bloom filter from the row header in the
data file and move them in the index file along with the (key,position) pair.
There is a number of choices/details worth mentioning:
* Only wide rows have a column index and bloom filter. So one difference with
the current implementation is that skinny rows have no column bloom filter. I
figure that it's probably not worth the space in the index file in that latter
case (but I'm fine discussing that point)
* The key cache now keeps the whole information from the index file for a given
row. This means that for wide rows, column index and bf are cached along with
the position. Which is imo a good thing, but does mean the size of a key cache
entry is not constant anymore (The estimation of the key cache memory size will
have to be modified accordingly but the current patch don't do it).
* For wide rows, the index entry also ship with the row deletion times. This is
necessary since we won't seek at the beginning of the row anymore.
* In the column indexes, offsets are relating to the beginning of the row in
the data file rather than from the beginning of the index as is the case now.
Some other implementation points:
* EchoedRow is removed. It would be possible to echo rows following this patch
but we would need to echo the column index too so that felt complicated enough
that it could be left to a later ticket if we consider it worth it.
* I didn't found a non overly complicated/inefficient way to implement this
patch without using seek() instead of just file marks. So in particular
MappedFileDataInput gets a seek() method, even though that method throw an
exception if we seek outside the segment (which should never happen).
I did a short (and honestly not very scientific) benchmark of a time series
like workload with a number of thread inserting time series columns in a bunch
of rows and other threads reading the tail of those rows (as expected, the
performance degrades with more sstables added and improve with compaction). As
soon as more than more than 1 sstable was present, the performance with this
patch was around 30-40% better than without the patch. I'll note that the test
was very short and with everything on local host, so again the exact benefits
may vary, but the ability to discard sstables based on index infos (saving a
seek) seems to be a clear boost in that case.
I didn't saw any noticeable difference (neither good or bad) on a normal
stress, as should be expected.
Note that this patch paves the way to removing the two phases compaction of
LazilyCompactedRow, but that is left to a follow up ticket.
> Promote row index
> -----------------
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Stu Hood
> Assignee: Stu Hood
> Labels: compression, index, timeseries
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt,
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row.
> For a row of appreciable size, the row index ends up directing the third seek
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains
> information that tells us whether or not the 3rd seek is necessary (the
> column range or name we are trying to slice may not exist in a given
> sstable), promoting the row index into the sstable index would allow us to
> drop the maximum number of seeks for wide rows back to 2, and, more
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data
> in wide rows, where data is appended to the beginning or end of the row. Our
> existing compaction strategy gets lucky and clusters the oldest data in the
> oldest sstables: for queries to recently appended data, we would be able to
> eliminate wide rows using only the sstable index, rather than needing to seek
> into the data file to determine that it isn't interesting. For narrow rows,
> this change would have no effect, as they will not reach the threshold for
> indexing anyway.
> A first cut design for this change would look very similar to the file format
> design proposed on #674:
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered,
> column names clustered, and offsets clustered and delta encoded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira