I think this would be useful and - having never really used Materialized Views - I didn't know it was an issue for some users. I would say the Cassandra Analytics library (http://github.com/apache/cassandra-analytics/) could be utilized for much of this, with a specialized Spark job for this purpose.
On Fri, 6 Dec 2024 at 08:26, Jaydeep Chovatia <chovatia.jayd...@gmail.com> wrote: > Hi, > > *NOTE: *This email does not promote using Cassandra's Materialized View > (MV) but assists those stuck with it for various reasons. > > The primary issue with MV is that once it goes out of sync with the base > table, no tooling is available to remediate it. This Spark job aims to fill > this gap by logically comparing the MV with the base table and identifying > inconsistencies. The job primarily does the following: > > - Scans Base Table (A), MV (B), and do {A}-{B} analysis > - Categorize each record into one of the four areas: a) Consistent, b) > Inconsistent, c) MissingInMV, d) MissingInBaseTable > - Provide a detailed view of mismatches, such as the primary key, all > the non-primary key fields, and mismatched columns. > - Dumps the detailed information to an output folder path provided to > the job (one can extend the interface to dump the records to some object > store as well) > - Optionally, the job fixes the MV inconsistencies. > - Rich configuration (throttling, actionable output, capability to > specify the time range for the records, etc.) to run the job at Scale in a > production environment > > Design doc: link > <https://docs.google.com/document/d/14mo_3TlKmaL3mC_Vs69k1n923CoJmVFvEFvuPAAHk4I/edit?usp=sharing> > The Git Repository: link > <https://github.com/jaydeepkumar1984/cassandra-mv-repair-spark-job> > > *Motivation* > > 1. This email's primary objective is to share with the community that > something like this is available for MV (in a private repository), which > may be helpful in emergencies to folks stuck with MV in production. > 2. If we, as a community, want to officially foster tooling using > Spark because it can be helpful to do many things beyond the MV work, such > as counting rows, etc., then I am happy to drive the efforts. > > Please let me know what you think. > > Jaydeep >