This is an automated email from the ASF dual-hosted git repository.

btellier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit a58709022caf0efa3983ed531a92e42c8a09b2ff
Author: Benoit Tellier <[email protected]>
AuthorDate: Fri Sep 25 13:18:15 2020 +0700

    [ADR] Applicative read repairs (POC mailbox & mailbox-counters)
---
 src/adr/0042-applicative-read-repairs.md | 101 +++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/src/adr/0042-applicative-read-repairs.md 
b/src/adr/0042-applicative-read-repairs.md
new file mode 100644
index 0000000..3f5bc58
--- /dev/null
+++ b/src/adr/0042-applicative-read-repairs.md
@@ -0,0 +1,101 @@
+# 42. Applicative read repairs (POC mailbox & mailbox-counters)
+
+Date: 2020-09-25
+
+## Status
+
+Adopted (lazy consensus)
+
+Completes [20. Cassandra Mailbox object 
consistency](0020-cassandra-mailbox-object-consistency.md),
+[23. Cassandra Mailbox Counters 
inconsistencies](0023-cassandra-mailbox-counters-inconsistencies.md)
+
+## Context
+
+Cassandra eventual consistency is all about "replication", but 
"denormalization" consistency needs
+to be handled at the applicative layer (due to the lack of transactions in a 
NoSQL database).
+
+In the past we did set up "Solve inconsistency" tasks that can be assimilated 
to Cassandra repairs. Such
+tasks, after being scheduled, ensure that the according entity denormalization 
is correctly denormalized.
+
+However, the inconsistencies persist between runs. We experienced 
inconsistencies in some production platform
+for both the mailbox entity, and the mailbox counter entity (whose table 
structure is exposed in
+[these](0020-cassandra-mailbox-object-consistency.md), 
[ADRs](0023-cassandra-mailbox-counters-inconsistencies.md)).
+Monitoring is required to detect when to run them and is time consuming for 
the platform administrator.
+Given a large dataset, it could even be impossible to run such tasks in a 
timely fashion.
+
+Another classic eventual consistency mechanism, that enables auto-healing is 
read-repair. Randomly piggy back upon reads
+synchronous or asynchronous consistency checks. If missed a repair is 
performed.
+
+In order to achieve denormalization auto-healing, we thus need to implement 
"applicative read repairs".
+
+## Decision
+
+Provide a Proof of concept for "Applicative read repairs" for the mailbox and 
mailbox-counters entities.
+
+This enables read path simplification (and performance enhancements) for the 
mailbox object.
+
+IMAP LIST should not read mailbox counters. This information is uneeded and we 
should avoid paying the
+price of read repairs for this operation.
+
+Provide a comprehensive documentation page regarding "Distributed James 
consistency model".
+
+## Consequences
+
+The expected **auto-healing** inconsistencies on existing deployments (at a 
limited configuration cost).
+This should ease operation of the Distributed James server.
+
+A configuration for James Distributed server will be added to control read 
repairs, per entity.
+
+## Alternatives
+
+Cassandra provides some alternative by itself:
+
+ - Secondary indexes avoids the denormalization in the first place. However 
they are not efficient in
+ a distributed environment as each node needs to be queried, which limits 
ability to scale.
+ - Materialized view enables Cassandra to maintain a projection on the behalf 
of the application,
+ coming with an expensive write cost, requiring synchronisation, not fit for 
complex denormalization
+ (like the message one: the primary key of the originating table needs to 
appear in the materialized
+ view primary key). Most of all, the updates are performed asynchronously. 
This mechanism is considered experimental.
+ - Cassandra BATCH suffers from the following downsides:
+   - A batch containing conditional updates can only operate within a single 
partition
+   - It is unadvised to update many partitions in a single batch, and keep the 
cardinality low for performance reasons
+
+BATCH could be a good option to keep tables synchronized, but does not apply 
to mailboxes (conditional update) nor
+counters.
+
+We already propose several tasks to solve denormalization inconsistencies. 
"Applicative read repairs" should be
+seen as a complement to it.
+
+Another classical mechanism in eventual consistent system is called 
hinted-handoff. It consists at retries
+(during a given period) when "replicating" data to other replica. We also 
already have a similar mechanism
+in James as we retry several times failures when writing data to 
denormalization table. Hard shut-down however
+defeats this strategy that is otherwise efficient to limit inconsistencies 
across denormalization tables.
+
+## References
+
+ - [Read repairs in 
Cassandra](https://cassandra.apache.org/doc/latest/operating/read_repair.html)
+ - [20. Cassandra Mailbox object 
consistency](0020-cassandra-mailbox-object-consistency.md)
+ - [23. Cassandra Mailbox Counters 
inconsistencies](0023-cassandra-mailbox-counters-inconsistencies.md)
+ - [Hinted 
handoff](https://cassandra.apache.org/doc/latest/operating/hints.html)
+ - [This link documents materialized views 
limitations](https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/knownLimitationsMV.html)
+ - [Materialized views considered 
experimental](https://www.mail-archive.com/[email protected]/msg54073.html)
+ - [CQL 
Batch](https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/cqlBatch.html)
+
+Especially:
+
+```
+Materialized View Limitations:
+
+    All updates to the view happen asynchronously unless corresponding view 
replica is the same node.
+    We must do this to ensure availability is not compromised.  It's easy to 
imagine a worst case
+    scenario of 10 Materialized Views for which each update to the base table 
requires writing to 10
+    separate nodes. Under normal operation views will see the data quickly and 
there are new metrics to
+    track it (ViewWriteMetrics).
+
+    There is no read repair between the views and the base table.  Meaning a 
read repair on the view will
+    only correct that view's data not the base table's data.  If you are 
reading from the base table though,
+    read repair will send updates to the base and the view.
+
+    Mutations on a base table partition must happen sequentially per replica 
if the mutation touches
+    a column in a view (this will improve after ticket CASSANDRA-10307)
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to