[
https://issues.apache.org/jira/browse/CASSANDRA-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264106#comment-15264106
]
Sylvain Lebresne commented on CASSANDRA-11475:
----------------------------------------------
bq. We are still going to be holding all of the updates that we generate in
memory; this refactor does make the memory footprint lower, but still includes
a potential for using up a lot of memory.
You're completely right and I don't any obvious way to avoid this since we rely
on all updates to be on the same batchlog mutation. I guess my point was that
since we can already have a high footprint, all the more reason to lower it as
much as we can.
bq. I think your todo is right about promoting TableViews; I'm not sure if we
can completely get rid of ViewManager, since we need something at the Keyspace
level, but maybe that stuff can be integrated into Keyspace again
I started moving {{TableViews}} to its own class with all the methods that deal
with a single base table but I stopped there. I do think we can and should get
rid of ViewManager: the 2 only things is does now is dealing with view by names
(for addition/removal) and holding the locks. This can probably move to
Keyspace but regarding lock I'm actually wondering if it shouldn't be move to
the table level. Is there a reason I'm forgetting why it should be at the
keyspace level? It will require a bit of refactor of Keyspace.apply() though
to be clean, which is why I've stopped here and I'd rather left that as a
follow up (this is only tangentially relevant to the rest of the patch).
bq. I find the UpdateBuilder pattern not to work well here. ViewUpdateBuilder
should be building ViewUpdate}}s, instead of {{Collection<PartitionUpdate>, and
using the build method at the end seems weird.
Is that a naming problem? I don't give that much importance to "patterns" so
having it not build a {{ViewUpdate}} doesn't bother me (it still "builds"
updates to views). But if you think that's a bit confusing, I don't mind
renaming to {{ViewMutationGenerator}} or something.
bq. to me, it seems like we'd be better off with not exposing the rows
I'm not sure to follow what you mean by that.
bq. MultiViewUpdateBuilder doesn't really seem necessary; it seems like it
should be part of ViewManager
I guess the idea was that it could have seen reuse some day and didn't felt
like a weird concept per-se, but you're right that the verbosity is not that
justified right now so got rid of it.
bq. In {{TableViews}}, {{removeByName}} is empty
Indeed. Fixed.
> MV code refactor
> ----------------
>
> Key: CASSANDRA-11475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11475
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 3.0.x, 3.x
>
>
> While working on CASSANDRA-5546 I run into a problem with TTLs on MVs, which
> looking more closely is a bug of the MV code. But one thing leading to
> another I reviewed a good portion of the MV code and found the following
> correction problems:
> * If a base row is TTLed then even if an update remove that TTL the view
> entry remained TTLed and expires, leading to an inconsistency.
> * Due to calling the wrong ctor for {{LivenessInfo}}, when a TTL was set on
> the base table, the view entry was living twice as long as the TTL. Again
> leading to a temporary inconsistency.
> * When reading existing data to compute view updates, all deletion
> informations are completely ignored (the code uses a {{PartitionIterator}}
> instead of an {{UnfilteredPartitionIterator}}). This is a serious issue since
> it means some deletions could be totally ignored as far as views are
> concerned especially when messages are delivered to a replica out of order.
> I'll note that while the 2 previous points are relatively easy to fix, I
> didn't find an easy and clean way to fix this one on the current code.
> Further, I think the MV code in general has inefficiencies/code complexities
> that should be avoidable:
> * {{TemporalRow.Set}} is buffering both everything read and a pretty much
> complete copy of the updates. That's a potentially high memory requirement.
> We shouldn't have to copy the updates and we shouldn't buffer all reads but
> rather work incrementally.
> * {{TemporalRow}}/{{TemporalRow.Set}}/{{TemporalCell}} classes are somewhat
> re-inventing the wheel. They are really just storing both an update we're
> doing and the corresponding existing data, but we already have
> {{Row}}/{{Partition}}/{{Cell}} for that. In practice, those {{Temporal*}}
> class generates a lot of allocations that we could avoid.
> * The code from CASSANDRA-10060 to avoid multiple reads of the base table
> with multiple views doesn't work when the update has partition/range
> tombstones because the code uses {{TemporalRow.Set.setTombstonedExisting()}}
> to trigger reuse, but the {{TemporalRow.Set.withNewViewPrimaryKey()}} method
> is used between view and it does not preseve the {{hasTombstonedExisting}}
> flag. But that oversight, which is trivial to fix, is kind of a good thing
> since if you fix it, you're left with a correction problem.
> The read done when there is a partition deletion depends on the view itself
> (if there is clustering filters in particular) and so reusing that read for
> other views is wrong. Which makes that whole reuse code really dodgy imo: the
> read for existing data is in {{View.java}}, suggesting that it depends on the
> view (which again, it does at least for partition deletion), but it shouldn't
> if we're going to reuse the result across multiple views.
> * Even ignoring the previous point, we still potentially read the base table
> twice if the update mix both row updates and partition/range deletions,
> potentially re-reading the same values.
> * It's probably more minor but the reading code is using {{QueryPager}},
> which is probably an artifact of the initial version of the code being
> pre-8099, but it's not necessary anymore (the reads are local and locally
> we're completely iterator based), adding, especially when we do page. I'll
> note that despite using paging, the current code still buffers everything in
> {{TemporalRow.Set}} anyway .
> Overall, I suspect trying to fix the problems above (particularly the fact
> that existing deletion infos are ignored) is only going to add complexity
> with the current code and we'd still have to fix the inefficiencies. So I
> propose a refactor of that code which does 2 main things:
> # it removes all of {{TemporalRow}} and related classes. Instead, it directly
> uses the existing {{Row}} (with all its deletion infos) and the update being
> applied to it and compute the updates for the view from that. I submit that
> this is more clear/simple, but this also avoid copying every cell of both the
> existing and update data as a {{TemporalCell}}. We can also reuse codes like
> {{Rows.merge}} and {{Rows.diff}} to make the handling of deletions relatively
> painless.
> # instead of dealing with each view one at a time, re-iterating over all
> updates each time, it iterates over each individual updates once and deal
> with each view for that update. This makes it more clear that the reads has
> to care about every view involved, but more importantly allow to deal with
> the read data incrementally, never buffering it all.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)