[ 
https://issues.apache.org/jira/browse/CASSANDRA-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-11475:
-------------------------------------
    Component/s: Materialized Views

> MV code refactor
> ----------------
>
>                 Key: CASSANDRA-11475
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11475
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination, Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0.7, 3.7
>
>
> While working on CASSANDRA-5546 I run into a problem with TTLs on MVs, which 
> looking more closely is a bug of the MV code. But one thing leading to 
> another I reviewed a good portion of the MV code and found the following 
> correction problems:
> * If a base row is TTLed then even if an update remove that TTL the view 
> entry remained TTLed and expires, leading to an inconsistency.
> * Due to calling the wrong ctor for {{LivenessInfo}}, when a TTL was set on 
> the base table, the view entry was living twice as long as the TTL. Again 
> leading to a temporary inconsistency.
> * When reading existing data to compute view updates, all deletion 
> informations are completely ignored (the code uses a {{PartitionIterator}} 
> instead of an {{UnfilteredPartitionIterator}}). This is a serious issue since 
> it means some deletions could be totally ignored as far as views are 
> concerned especially when messages are delivered to a replica out of order. 
> I'll note that while the 2 previous points are relatively easy to fix, I 
> didn't find an easy and clean way to fix this one on the current code.
> Further, I think the MV code in general has inefficiencies/code complexities 
> that should be avoidable:
> * {{TemporalRow.Set}} is buffering both everything read and a pretty much 
> complete copy of the updates. That's a potentially high memory requirement. 
> We shouldn't have to copy the updates and we shouldn't buffer all reads but 
> rather work incrementally.
> * {{TemporalRow}}/{{TemporalRow.Set}}/{{TemporalCell}} classes are somewhat 
> re-inventing the wheel. They are really just storing both an update we're 
> doing and the corresponding existing data, but we already have 
> {{Row}}/{{Partition}}/{{Cell}} for that. In practice, those {{Temporal*}} 
> class generates a lot of allocations that we could avoid.
> * The code from CASSANDRA-10060 to avoid multiple reads of the base table 
> with multiple views doesn't work when the update has partition/range 
> tombstones because the code uses {{TemporalRow.Set.setTombstonedExisting()}} 
> to trigger reuse, but the {{TemporalRow.Set.withNewViewPrimaryKey()}} method 
> is used between view and it does not preseve the {{hasTombstonedExisting}} 
> flag.  But that oversight, which is trivial to fix, is kind of a good thing 
> since if you fix it, you're left with a correction problem.
>   The read done when there is a partition deletion depends on the view itself 
> (if there is clustering filters in particular) and so reusing that read for 
> other views is wrong. Which makes that whole reuse code really dodgy imo: the 
> read for existing data is in {{View.java}}, suggesting that it depends on the 
> view (which again, it does at least for partition deletion), but it shouldn't 
> if we're going to reuse the result across multiple views.
> * Even ignoring the previous point, we still potentially read the base table 
> twice if the update mix both row updates and partition/range deletions, 
> potentially re-reading the same values.
> * It's probably more minor but the reading code is using {{QueryPager}}, 
> which is probably an artifact of the initial version of the code being 
> pre-8099, but it's not necessary anymore (the reads are local and locally 
> we're completely iterator based), adding, especially when we do page. I'll 
> note that despite using paging, the current code still buffers everything in 
> {{TemporalRow.Set}} anyway .
> Overall, I suspect trying to fix the problems above (particularly the fact 
> that existing deletion infos are ignored) is only going to add complexity 
> with the current code and we'd still have to fix the inefficiencies. So I 
> propose a refactor of that code which does 2 main things:
> # it removes all of {{TemporalRow}} and related classes. Instead, it directly 
> uses the existing {{Row}} (with all its deletion infos) and the update being 
> applied to it and compute the updates for the view from that. I submit that 
> this is more clear/simple, but this also avoid copying every cell of both the 
> existing and update data as a {{TemporalCell}}. We can also reuse codes like 
> {{Rows.merge}} and {{Rows.diff}} to make the handling of deletions relatively 
> painless.
> # instead of dealing with each view one at a time, re-iterating over all 
> updates each time, it iterates over each individual updates once and deal 
> with each view for that update. This makes it more clear that the reads has 
> to care about every view involved, but more importantly allow to deal with 
> the read data incrementally, never buffering it all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to