Hi, Couple of comments on this.
What you're proposing is difficult to do at scale and would require some type of Paxos style algorithm for the update only if different - it would be easier in that case to just go ahead and do the update. Also, it seems like a conflation of concerns - in an event sourcing model, we save the immutable event, and represent current state in another, separate data structure. Perhaps cassandra would work well here - that data model night provide what you're looking for out of the box. Just as I don't recommend people use data stores as queuing mechanisms, I also recommend not using a queuing mechanism as a primary datastore - mixed semantics. -- *Colin* +1 612 859-6129 On Mon, Jan 5, 2015 at 4:47 AM, Daniel Schierbeck < daniel.schierb...@gmail.com> wrote: > I'm trying to design a system that uses Kafka as its primary data store by > persisting immutable events into a topic and keeping a secondary index in > another data store. The secondary index would store the "entities". Each > event would pertain to some "entity", e.g. a user, and those entities are > stored in an easily queriable way. > > Kafka seems well suited for this, but there's one thing I'm having problems > with. I cannot guarantee that only one process writes events about an > entity, which makes the design vulnerable to integrity issues. > > For example, say that a user can have multiple email addresses assigned, > and the EmailAddressRemoved event is published when the user removes one. > There's an integrity constraint, though: every user MUST have at least one > email address. As far as I can see, there's no way to stop two separate > processes from looking up a user entity, seeing that there are two email > addresses assigned, and each publish an event. The end result would violate > the contraint. > > If I'm wrong in saying that this isn't possible I'd love some feedback! > > My current thinking is that Kafka could relatively easily support this kind > of application with a small additional API. Kafka already has the abstract > notion of entities through its key-based retention policy. If the produce > API was modified in order to allow an integer OffsetConstraint, the > following algorithm could determine whether the request should proceed: > > 1. For every key seen, keep track of the offset of the latest message > referencing the key. > 2. When an OffsetContraint is specified in the produce API call, compare > that value with the latest offset for the message key. > 2.1. If they're identical, allow the operation to continue. > 2.2. If they're not identical, fail with some OptimisticLockingFailure. > > Would such a feature be completely out of scope for Kafka? > > Best regards, > Daniel Schierbeck >