[
https://issues.apache.org/jira/browse/CASSANDRA-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis reassigned CASSANDRA-2677:
-----------------------------------------
Assignee: (was: Sylvain Lebresne)
> Optimize streaming to be single-pass
> ------------------------------------
>
> Key: CASSANDRA-2677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2677
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Priority: Minor
> Fix For: 0.8.2
>
>
> Streaming currently is a two-pass operation: one to write the Data component
> do disk from the socket, then another to build the index and bloom filter
> from it. This means we do about 2x the i/o we would if we created the index
> and BF during the original write.
> For node movement this was not considered to be a Big Deal because the stream
> target is not a member of the ring, so we can be inefficient without hurting
> live queries. But optimizing node movement to not require un/rebootstrap
> (CASSANDRA-1427) and bulk load (CASSANDRA-1278) mean we can stream to live
> nodes too.
> The main obstacle here is we don't know how many keys will be in the new
> sstable ahead of time, which we need to size the bloom filter correctly. We
> can solve this by including that information (or a close approximation) in
> the stream setup -- the source node can calculate that without hitting disk
> from the in-memory index summary.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira