[ 
https://issues.apache.org/jira/browse/CASSANDRA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842290#action_12842290
 ] 

Jonathan Ellis commented on CASSANDRA-847:
------------------------------------------

Let's keep this simple.

The goal is to create an abstraction that (a) compaction code can apply to both 
old and new data formats, while (b) allowing for memory-efficient compactions 
on the new format and (c) making new-format indexing of subcolumns possible and 
(d) ideally allowing up to 256 levels of new-format subcolumn nesting.

In particular, the goal does not include improving efficiency of compaction of 
old format data; if that falls out naturally, fine, but it's not really our 
goal.

Nor is it yet our goal to support the new format, in this patchset, although 
maybe it should be.  Compacting from old format to old format, with new data 
structures, is not part of our ultimate goal either, and making it an 
intermediate step may be making things harder than necessary.  It may be 
simpler to introduce the new format first, so we can skip to compacting from 
old -> new and new -> new, not bothering with old -> old.

Are we on the same page?

I think the simplest way to get to this is to simply continue using IColumn.  
It generalizes just fine to multiple levels, and the existing implementation 
knows how to use abstractions like mostRecentLiveChangeAt to handle tricky 
problems like tombstones.  Throwing this away and starting over will lead us 
eventually to the same place.  [Although certainly some parts like 
getObjectCount won't be needed and can ultimately be removed.]  Also, sharing 
code b/t old and new formats is within reason a good thing.  So let's keep 
IColumn (I believe the analogue in your patch is Named?) and Column.

ColumnFamily + SuperColumn should be replaced with a more generalized structure 
supporting arbitrary nesting.  Here I think ColumnGroup is a better name than 
Slice; we use the latter term in querying, which would be potentially 
confusing.  But I think it would have a lot in common w/ the existing 
CF/SuperColumn code.  Each ColumnGroup, like Column, only needs a byte[] name.  
No need to copy a lot of full paths around; experience with existing code shows 
that this is unnecessary.

Mapping this to the old data format is hopefully clear since it resembles it 
relatively strongly.  What about the new format?  Here we come back to my 
advocating that "all container information goes in the block header, followed 
by serialized Columns [not IColumns, just name-data-ts triples]."  This is 
where we will need something like ColumnKey to contain column boundaries -- 
i.e., not in this patchset, unless you decide that actually introducing the new 
format here is the way to go.

Thus, for compaction, our algorithm goes something like "read all the header 
information at once and build the ColumnGroup structure in memory, then iterate 
through matching sub-columngroups, merging as necessary."  Since we read the 
header all at once, and then the subcolumns in-order, all i/o within a single 
sstable remains sequential.

It's not clear to me how to apply the old ReducingIterator approach to 
multilevel groups when the data to merge into one Block may be spread across 
multiple Blocks in another sstable, although I find the iterator design very 
elegant and easy to confirm correctness in.  So you are probably right that 
this has to change.

One other thing about header info / column key: it would be nice to come up 
with a scheme that doesn't repeat the full path in the description of each 
ColumnGroup [i.e., ColumnKey or its analogue], at least not on-disk; in a 
heavily nested structure that would be a lot of duplication of the initial path 
elements, although presumably compression would mitigate this some.

What do you think?

> Make the reading half of compactions memory-efficient
> -----------------------------------------------------
>
>                 Key: CASSANDRA-847
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-847
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>         Attachments: 
> 0001-Add-structures-that-were-important-to-the-SSTableSca.patch, 
> 0002-Implement-most-of-the-new-SSTableScanner-interface.patch, 
> 0003-Rename-RowIndexedReader-specific-test.patch, 
> 0004-Improve-Scanner-tests-and-separate-SuperCF-handling-.patch, 
> 0005-Add-Scanner-interface-and-a-Filtered-implementation-.patch, 
> 0006-Add-support-for-compaction-of-super-CFs-and-some-tes.patch
>
>
> This issue is the next on the road to finally fixing CASSANDRA-16. To make 
> compactions memory efficient, we have to be able to perform the compaction 
> process on the smallest possible chunks that might intersect and contend 
> one-another, meaning that we need a better abstraction for reading from 
> SSTables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to