Tom van der Woerdt created CASSANDRA-12245:
----------------------------------------------
Summary: initial view build can be parallel
Key: CASSANDRA-12245
URL: https://issues.apache.org/jira/browse/CASSANDRA-12245
Project: Cassandra
Issue Type: Improvement
Reporter: Tom van der Woerdt
On a node with lots of data (~3TB) building a materialized view takes several
weeks, which is not ideal. It's doing this in a single thread.
There are several potential ways this can be optimized :
* do vnodes in parallel, instead of going through the entire range in one
thread
* just iterate through sstables, not worrying about duplicates, and include
the timestamp of the original write in the MV mutation. since this excludes
duplicates it does increase the amount of work and could temporarily surface
ghost rows (yikes) but I guess that's why they call it eventual consistency.
doing it this way can avoid holding references to all tables on disk, allows
parallelization, and removes the need to check other sstables for existing
data. this is essentially the 'do a full repair' path
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)