Re: Materialized View inconsistency issue

Regis Le Bretonnic Tue, 15 Aug 2023 08:50:05 -0700

Hi Josh...

A long (and almost private) message to explain how we fix materialized
views.


Let me first explain our use case... I work for an european dating website.
Users can received visits from other users (typically when someone looks at
a member profile page), and we want to inform them for each visit received
(sorted from the most recent one to the oldest one).
But imagine that Karen goes several times on my profile page... I don't
want to see all her visits but only the last one. So, we want to
deduplicate rows (see only once Karen), and ordered the rows (showing Julia
that visit me 1 minute ago, Sophia that visit me 3 minutes ago, Karen that
visit me 10 minutes ago, and so on).

You can not do that in cassandra. If you want to deduplicate rows by pairs
of users, the "visit timestamp" can not be in the primary key... and if you
want to order rows by the "visit timestamp", this field must be in the
clustering columns and consequently in the primary key. That is just not
possible !

Waht we do is :
- a master table like this :

CREATE TABLE visits_received (
    receiver_id bigint,
    sender_id bigint,
    visit_date timestamp,
    PRIMARY KEY ((receiver_id), sender_id)
) WITH CLUSTERING ORDER BY (sender_id ASC);

- and a materialized view like this :

CREATE MATERIALIZED VIEW visits_received_by_date as
    SELECT receiver_id, sender_id, visit_date
    FROM visits_received
    WHERE receiver_id IS NOT NULL AND sender_id IS NOT NULL AND visit_date
IS NOT NULL
    PRIMARY KEY ((receiver_id), visit_date, sender_id)
    WITH CLUSTERING ORDER BY (visit_date DESC, sender_id ASC);

With this the master table deduplicates, and the MV sorts rows the way we
want.


Problems we have are most of the time having rows that should not exist in
the MV...
Let's say that I have this row in the master table :
- 111, 222, t3
and that because of materialized view unconsistency, I have 3 rows in the
MV :
- 111, 222, t3
- 111, 222, t2
- 111, 222, t1

then to remove the 2 wrong rows in the MV, we do a double insert on the
master table :
insert (111, 222, t1) + insert (111, 222, t3) -> this remove the row "111,
222, t1"
insert (111, 222, t2) + insert (111, 222, t3) -> this remove the row "111,
222, t2"

We can very, very rarely have other cases (rows in master and not in MV),
but these are also very easy to fix by just re-inserting the master rows.


Now about our spark script :
- we download sequentially the master table and the MV
- we compare them to find ... "potential inconsistencies" (because the
tables are not download at the same time and data can have change, we can
find false positive errors)
- we loop on all the "potential inconsistencies" and force a new read on
the table and the MV to check if there is truly inconsistency when reads
are made in few milliseconds
- if it is a true inconsistency, we force inserts on the master table to
fix the MV as describe below


Now, about the volume of inconsistency :
- on a master table with 1.7 B-rows
- we have ~ 12.5 K-rows that are unconsistent (0,0007%) after 2 years...
clearly better than what our developpers will do by managing inserts and
deletes by themshelves (and acceptable for our use case)


Le lun. 14 août 2023 à 16:36, Josh McKenzie <jmcken...@apache.org> a écrit :

> When it comes to denormalization in Cassandra today your options are to
> either do it yourself in your application layer or rely on Materialized
> Views to do it for you at the server layer. Neither are production-ready
> approaches out of the box (which is one of the biggest flaws in the
> "provide it server side as a feature" approach); both implementations will
> need you as a user to:
>
>    1. Deal with failure cases (data loss in base table, consistency
>    violations between base and view due to failures during write /
>    anti-entropy vs. gc_grace, etc) and
>    2. Manage the storage implications of a given base write and the
>    denormalized writes that it spawns. This is arguably worse with MV's as you
>    have less visibility into the fanout and they're easier to create; it was
>    common to see folks create 5-10 views on a base table when they were first
>    released and lock up tables and exhaust storage disks, not realizing the
>    implications.
>
> The current inability to clearly see and rely on the state of consistency
> between a base and a view is a significant limitation that's shared by both
> the MV implementation and a user-hand-rolled version. @regis I'd be super
> interested to hear more about:
>
> we made a spark script downloading the master table and the MV, and
> comparing them and fixing data (as said previously we have very few errors
> and we run it maybe once a year
>
> Given the inclusion of the spark bulk reader and writer in the project
> ecosystem, this could prove to be something really useful for a lot of
> users.
>
> In a post-Accord world with atomic durable multi-partition transactions,
> we should be able to create a more robust, consistent implementation of
> MV's. This doesn't solve the problem of "complete data loss on a base table
> leaves you with data in a view that's orphaned; you need to rebuild the
> view." That said, a Materialized Views feature that only has that one
> caveat of "if you lose data in the base you need to recreate the views"
> would be a significant improvement. It should also be pretty trivial to
> augment the upcoming size commands to support future MV's as well (
> CASSANDRA-12367 <https://issues.apache.org/jira/browse/CASSANDRA-12367>)
>
> So yeah. Denormalization is a Hard Problem. MV's were an attempt to take a
> burden off the user but we just didn't have sufficiently robust primitives
> to build on at that time to get it where it needed to go.
>
> I'm personally still on the fence between whether a skilled user should go
> with hand-rolled vs. MV's today, but for the general populace of C* users
> (i.e. people that don't have time to get into the weeds), they're probably
> best avoided still for now.
>
> On Thu, Aug 10, 2023, at 8:19 PM, MyWorld wrote:
>
> Hi surbhi ,
> There are 2 drawbacks associated with MV.
> 1. Inconsistent view
> 2. The lock it takes on the base table. This gets worse when you have huge
> number of clustering keys in a specific partition.
>
> It's better you re-design a seperate table and let your API do a parallel
> write on both.
>
> Regards,
> Ashish
>
> On Fri, 11 Aug, 2023, 02:03 Surbhi Gupta, <surbhi.gupt...@gmail.com>
> wrote:
>
> Thanks everyone.
>
>
> On Wed, 9 Aug 2023 at 01:00, Regis Le Bretonnic
> <r.lebreton...@meetic-corp.com> wrote:
> >
> > Hi Surbhi
> >
> > We do use cassandra materialized views even if not recommended.
> > There are known issues you have to make with. Despite of them, we still
> use VM.
> > What we observe is :
> > * there are  inconsistency issues but few. Most of them are rows that
> should not exist in the MV...
> > * we made a spark script downloading the master table and the MV, and
> comparing them and fixing data (as said previously we have very few errors
> and we run it maybe once a year)
> >
> > * Things go very very very bad when you add or remove a node ! Limit
> this operation if possible and do it knowing what can happen (we isolate
> the ring/datacenter and fix data before putting it back to production. We
> did this only once in the last 4 years).
> >
> > PS : all proposals avoiding MV failed for our project. Basically
> managing a table like a MV (by deleting and inserting rows from code) is
> worse and more corrupted than what MV does...
> > The worse issue is adding and removing nodes. Maybe cassandra 4 improves
> this point (not tested yet).
> >
> > Have fun...
> >
> > Le mar. 8 août 2023 à 22:36, Surbhi Gupta <surbhi.gupt...@gmail.com> a
> écrit :
> >>
> >> Hi,
> >>
> >> We get complaints about Materialized View inconsistency issues.
> >> We are on 3.11.5 and on 3.11.5 Materialized Views were not production
> ready.
> >> We are ok to upgrade.
> >>
> >> On which version of cassandra MVs doesnt have inconsistency issues?
> >>
> >> Thanks
> >> Surbhi
>
>
>

Re: Materialized View inconsistency issue

Reply via email to