Re: Materialized View inconsistency issue

Josh McKenzie Mon, 14 Aug 2023 07:36:24 -0700

When it comes to denormalization in Cassandra today your options are to either 
do it yourself in your application layer or rely on Materialized Views to do it 
for you at the server layer. Neither are production-ready approaches out of the 
box (which is one of the biggest flaws in the "provide it server side as a 
feature" approach); both implementations will need you as a user to:
 1. Deal with failure cases (data loss in base table, consistency violations 
between base and view due to failures during write / anti-entropy vs. gc_grace, 
etc) and
 2. Manage the storage implications of a given base write and the denormalized 
writes that it spawns. This is arguably worse with MV's as you have less 
visibility into the fanout and they're easier to create; it was common to see 
folks create 5-10 views on a base table when they were first released and lock 
up tables and exhaust storage disks, not realizing the implications.
The current inability to clearly see and rely on the state of consistency 
between a base and a view is a significant limitation that's shared by both the 
MV implementation and a user-hand-rolled version. @regis I'd be super 
interested to hear more about:
> we made a spark script downloading the master table and the MV, and comparing 
> them and fixing data (as said previously we have very few errors and we run 
> it maybe once a year
Given the inclusion of the spark bulk reader and writer in the project 
ecosystem, this could prove to be something really useful for a lot of users.

In a post-Accord world with atomic durable multi-partition transactions, we 
should be able to create a more robust, consistent implementation of MV's. This 
doesn't solve the problem of "complete data loss on a base table leaves you 
with data in a view that's orphaned; you need to rebuild the view." That said, 
a Materialized Views feature that only has that one caveat of "if you lose data 
in the base you need to recreate the views" would be a significant improvement. 
It should also be pretty trivial to augment the upcoming size commands to 
support future MV's as well (CASSANDRA-12367 
<https://issues.apache.org/jira/browse/CASSANDRA-12367>)

So yeah. Denormalization is a Hard Problem. MV's were an attempt to take a 
burden off the user but we just didn't have sufficiently robust primitives to 
build on at that time to get it where it needed to go.

I'm personally still on the fence between whether a skilled user should go with 
hand-rolled vs. MV's today, but for the general populace of C* users (i.e. 
people that don't have time to get into the weeds), they're probably best 
avoided still for now.

On Thu, Aug 10, 2023, at 8:19 PM, MyWorld wrote:
> Hi surbhi ,
> There are 2 drawbacks associated with MV.
> 1. Inconsistent view
> 2. The lock it takes on the base table. This gets worse when you have huge 
> number of clustering keys in a specific partition.
> 
> It's better you re-design a seperate table and let your API do a parallel 
> write on both.
> 
> Regards,
> Ashish
> 
> On Fri, 11 Aug, 2023, 02:03 Surbhi Gupta, <surbhi.gupt...@gmail.com> wrote:
>> Thanks everyone.
>> 
>> 
>> On Wed, 9 Aug 2023 at 01:00, Regis Le Bretonnic
>> <r.lebreton...@meetic-corp.com> wrote:
>> >
>> > Hi Surbhi
>> >
>> > We do use cassandra materialized views even if not recommended.
>> > There are known issues you have to make with. Despite of them, we still 
>> > use VM.
>> > What we observe is :
>> > * there are  inconsistency issues but few. Most of them are rows that 
>> > should not exist in the MV...
>> > * we made a spark script downloading the master table and the MV, and 
>> > comparing them and fixing data (as said previously we have very few errors 
>> > and we run it maybe once a year)
>> >
>> > * Things go very very very bad when you add or remove a node ! Limit this 
>> > operation if possible and do it knowing what can happen (we isolate the 
>> > ring/datacenter and fix data before putting it back to production. We did 
>> > this only once in the last 4 years).
>> >
>> > PS : all proposals avoiding MV failed for our project. Basically managing 
>> > a table like a MV (by deleting and inserting rows from code) is worse and 
>> > more corrupted than what MV does...
>> > The worse issue is adding and removing nodes. Maybe cassandra 4 improves 
>> > this point (not tested yet).
>> >
>> > Have fun...
>> >
>> > Le mar. 8 août 2023 à 22:36, Surbhi Gupta <surbhi.gupt...@gmail.com> a 
>> > écrit :
>> >>
>> >> Hi,
>> >>
>> >> We get complaints about Materialized View inconsistency issues.
>> >> We are on 3.11.5 and on 3.11.5 Materialized Views were not production 
>> >> ready.
>> >> We are ok to upgrade.
>> >>
>> >> On which version of cassandra MVs doesnt have inconsistency issues?
>> >>
>> >> Thanks
>> >> Surbhi

Re: Materialized View inconsistency issue

Reply via email to