[ 
https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725726#action_12725726
 ] 

Jun Rao commented on CASSANDRA-16:
----------------------------------

A couple of comments.

1. While the row index has one index entry per row, the column index has one 
index entry per group of columns. So, the chance of column index not fitting in 
memory is low. Plus, one can always increase the column group size to reduce 
the index footprint.

2. As a general solution, maybe we can put the column index after the column 
data in the same file. During compaction, we try to keep the column index in 
memory. If not possible, we append the column index to a temp file first. After 
we have written all columns, we copy the column index from the temp file to the 
end of the data file. So, in the worse case, we make two passes of the column 
index, but not the column data.

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to 
> fit in memory entirely, but can easily fit on a disk. The compaction 
> algorithm today de-serializes the entire row in memory before writing out the 
> compacted SSTable (see ColumnFamilyStore.doCompaction() and associated 
> methods).
> The requirement is to have a compaction method with a lower memory 
> requirement so we can support rows larger than available main memory. To 
> re-use the old FB example, if we stored a user's inbox in a row, we'd want 
> the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to