[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Jonathan Ellis (JIRA) Wed, 01 Jul 2009 08:42:11 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726111#action_12726111
 ]


Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------

You're right, since not each column name is indexed I think we can get by with 
column index in memory.  this will allow 100s of millions of columns, maybe 
only 10s to make sure you can hold multiple large indexes in memory at once, 
but that is still adequate for any use case I can think of.  So I don't think 
we need to worry about writing indexes to a separate file for that reason.

There are two other downsides though to endex-at-the-end; one is having to do 
an extra seek (we seek first to the end of the row to read the index size, then 
have to seek back from there to read the actual index), and the other is that 
index-at-the-end code will is inherently more complex than 
index-in-separate-file.

But index-in-separate-file has its own problems; an extra fopen on the 
performance side, and since we'd want to keep small indexes inline, the 
complexity of handling both inline indexes and separate-file ones.

On balance I think I lean towards index-at-the-end and hope we have enough ram 
that the OS cache can make the extra seek go away. :)

> Memory efficient compactions 
> -----------------------------
>
>                 Key: CASSANDRA-16
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: All
>            Reporter: Sandeep Tata
>            Priority: Critical
>             Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to 
> fit in memory entirely, but can easily fit on a disk. The compaction 
> algorithm today de-serializes the entire row in memory before writing out the 
> compacted SSTable (see ColumnFamilyStore.doCompaction() and associated 
> methods).
> The requirement is to have a compaction method with a lower memory 
> requirement so we can support rows larger than available main memory. To 
> re-use the old FB example, if we stored a user's inbox in a row, we'd want 
> the inbox to grow bigger than memory so long as it fit on disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-16) Memory efficient compactions

Reply via email to