[
https://issues.apache.org/jira/browse/CASSANDRA-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725449#action_12725449
]
Jonathan Ellis commented on CASSANDRA-16:
-----------------------------------------
Indexes are the real problem we're going to have to deal with here.
We can't write the indexes first, if we can't merge the columns we're indexing
in memory. (Not without making two passes: one to scan all the column names
while writing the indexes, and another to do the full merge. Two passes is too
high a cost to pay.)
But we can't merge the columns in a streaming fashion while keeping the index
data in memory to spit out at the end, either. We just fixed a bug from taking
exactly this approach in CASSANDRA-208: this would limit the number of columns
we support to a relatively small number; probably low millions, depending on
your column name size and how much memory you can throw at the jvm.
I think a hybrid approach is called for. If there are less than some threshold
of columns (1000? 100000?) we merge in memory and put the index first, as we do
now. Otherwise, we do a streaming merge and write the index to a separate
file, similar to how we write the key index now. (In fact we could probably
encapsulate this code as SSTableIndexWriter and use it in both places.)
We don't want to _always_ index in separate file because (a) filesystems have
limits too -- we don't want one index file per row per columnfamily -- and
because we want to do streaming writes wherever possible, which means staying
in the same file.
This approach will result in a litlte more seeking (between column index and
sstable) than the two-pass inline approach, but merging in a single pass is
worth the trade. (Remember that for large rows, reading the input multiple
sstables will not be seek-free either once buffers max out. So we want to keep
to a single pass for performance as well as simplicity.)
> Memory efficient compactions
> -----------------------------
>
> Key: CASSANDRA-16
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16
> Project: Cassandra
> Issue Type: Improvement
> Environment: All
> Reporter: Sandeep Tata
> Priority: Critical
> Fix For: 0.4
>
>
> The basic idea is to allow rows to get large enough that they don't have to
> fit in memory entirely, but can easily fit on a disk. The compaction
> algorithm today de-serializes the entire row in memory before writing out the
> compacted SSTable (see ColumnFamilyStore.doCompaction() and associated
> methods).
> The requirement is to have a compaction method with a lower memory
> requirement so we can support rows larger than available main memory. To
> re-use the old FB example, if we stored a user's inbox in a row, we'd want
> the inbox to grow bigger than memory so long as it fit on disk.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.