[ 
https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725449#action_12725449
 ] 

Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

Indexes are the real problem we're going to have to deal with here.

We can't write the indexes first, if we can't merge the columns we're indexing 
in memory.  (Not without making two passes: one to scan all the column names 
while writing the indexes, and another to do the full merge.  Two passes is too 
high a cost to pay.)

But we can't merge the columns in a streaming fashion while keeping the index 
data in memory to spit out at the end, either.  We just fixed a bug from taking 
exactly this approach in CASSANDRA-208: this would limit the number of columns 
we support to a relatively small number; probably low millions, depending on 
your column name size and how much memory you can throw at the jvm.

I think a hybrid approach is called for.  If there are less than some threshold 
of columns (1000? 100000?) we merge in memory and put the index first, as we do 
now.  Otherwise, we do a streaming merge and write the index to a separate 
file, similar to how we write the key index now.  (In fact we could probably 
encapsulate this code as SSTableIndexWriter and use it in both places.)

We don't want to _always_ index in separate file because (a) filesystems have 
limits too -- we don't want one index file per row per columnfamily -- and 
because we want to do streaming writes wherever possible, which means staying 
in the same file.

This approach will result in a litlte more seeking (between column index and 
sstable) than the two-pass inline approach, but merging in a single pass is 
worth the trade.  (Remember that for large rows, reading the input multiple 
sstables will not be seek-free either once buffers max out.  So we want to keep 
to a single pass for performance as well as simplicity.)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to 
> fit in memory entirely, but can easily fit on a disk. The compaction 
> algorithm today de-serializes the entire row in memory before writing out the 
> compacted SSTable (see ColumnFamilyStore.doCompaction() and associated 
> methods).
> The requirement is to have a compaction method with a lower memory 
> requirement so we can support rows larger than available main memory. To 
> re-use the old FB example, if we stored a user's inbox in a row, we'd want 
> the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to