[
https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261518#comment-16261518
]
Paulo Motta commented on CASSANDRA-12245:
-----------------------------------------
bq. I have moved the base table flush from ViewBuilderTask to ViewBuilder here,
to do a single flush at the begining of the view build. The following writes
will be writen to the MV through the regular path so it seems that they won't
need any further flushes. I think that with this we don't need to check the
table size and give a special treatment to small ones, what do you think?
Much better this way, good job!
bq. It seems that the configuration section of the doc is currently empty. I
think that writting this section (structure, introduction, etc.) is probably
out of the scope of this ticket and it might be done in a separate, dedicated
ticket. Instead, I have updated the NEWS.txt file with more detailed info and I
have added a note to the doc about CREATE MATERIALIZED VIEW statement.
oh, my bad! I just noticed the configuration section is [dynamically
generated|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/doc/Makefile#L20]
from cassandra.yaml. In any case, the NEWS.txt entry and new MV doc looks
great!
bq. I have updated the dtest interrupt_build_process_test to make sure that the
build is really interrupted also in 3.x through new byteman scripts. Without
that, the build could finish before the cluster stop.
Awesome!
bq. Here is the new version of the patch, rebased and squashed. The udpated
dtests can be found here.
I had a final look at the patch and tests, and it mostly looks good, but during
the last review I found one edge case still not addressed: if the view build is
stopped via {{nodetool stop VIEW_BUILD}}, the view build will be [considered
successful|https://github.com/adelapena/cassandra/blob/e1ace2f47be71d48ab1987d0e2c7a07cc9486e97/src/java/org/apache/cassandra/db/view/ViewBuilder.java#L174],
so we should probably throw a {{CompactionInterruptedException}} when the view
build is stopped and not resubmit a new view build task in this case.
I think this last case is also present on 3.0 so it's not something introduced
by this patch but it would be nice to address it here. Would you mind adding a
dtest for this as well to ensure the view will not be marked as built after the
view build is stopped? Sorry for delaying this further, but after this is
addressed, this should be ready to go after a new clean CI round.
> initial view build can be parallel
> ----------------------------------
>
> Key: CASSANDRA-12245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12245
> Project: Cassandra
> Issue Type: Improvement
> Components: Materialized Views
> Reporter: Tom van der Woerdt
> Assignee: Andrés de la Peña
> Fix For: 4.x
>
>
> On a node with lots of data (~3TB) building a materialized view takes several
> weeks, which is not ideal. It's doing this in a single thread.
> There are several potential ways this can be optimized :
> * do vnodes in parallel, instead of going through the entire range in one
> thread
> * just iterate through sstables, not worrying about duplicates, and include
> the timestamp of the original write in the MV mutation. since this doesn't
> exclude duplicates it does increase the amount of work and could temporarily
> surface ghost rows (yikes) but I guess that's why they call it eventual
> consistency. doing it this way can avoid holding references to all tables on
> disk, allows parallelization, and removes the need to check other sstables
> for existing data. this is essentially the 'do a full repair' path
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]