Re: [akka-user] Apache Kafka as journal - retention times/PersistentView and partitions

Martin Krasser Tue, 26 Aug 2014 21:31:33 -0700

Whether to go for a 1:1 approach or a 1:n approach (or a partitioned m:napproach where m << n) really depends on the concrete use case andnon-functional requirements. Your example might be a good candidate fora 1:1 approach (see also further comments inline) but there are alsoexamples for which a 1:n or m:n approach is a better choice. Here aresome general influencing factors:

- length of event history required to recover state: bank accounts needthe full event history to be recovered but order management is anexample where this is often not the case. Orders (trade orders infinance, lab orders during medical treatments, ...) usually have alimited validity so that you can recover active orders from a limitedevent history (last 10 days, for example) which should make migrationsafter code changes rather painless. BTW, having only a single persistentactor (or a few) that maintains state is comparable to role of a"Business Logic Processor" in the LMAX architecture which originatedfrom the high frequency trading domain.

- latency requirements: creating a new persistent actor has someoverhead, not only memory but also bootstrap as its creation requires aroundtrip to the backend store. Re-activation of passivated actors thathave been designed around a 1:1 approach, may also be in conflict withlow latency requirements. Good compromises can often be found byfollowing an m:n approach in this case.

- write throughput: high write throughput can only be achieved bybatching writes and batching is currently implemented on a perpersistent actor basis. Throughput therefore scales better when having asmall(er) number of actors. A large number of actors will create morebut smaller batches, reducing throughput. This is however more alimitation of the current implementation of akka-persistence. Maybe aswitch to batching on journal level is a good idea, so that a singlewrite batch can contain events from several actors.


- ...

Even if you need to replay a long event history (for example after acode change), you can always do that in the background on a separatenode until the new version of the persistent actor caught up and switchthe application to it when done. You could even have both versionsrunning at the same time for A/B testing for example. With a replay rateof 100k/sec you can replay a billion events within a few hours.


Further comments inline ...

On 26.08.14 20:34, Greg Young wrote:

OK for bank accounts there is some amount of state needed to verify atransaction. Let's propose that for now its the branch you opened youraccount at, your current balance,your address and a riskclassification as well as a customer profitability/loyalty score(these are all reasonable things to track in terms of deciding if atransaction should be accepted or not)

When validating commands, you only need to keep that part of applicationstate within persistent actors for which you have strict consistencyrequirements. In context of bank accounts, this is for sure the case forthe balance, but not necessarily for customer profitability, loyalityscore or whatever. These metrics may be calculated in the background,hence, having eventual read consistency for them should be sufficient.Consequently this state can be maintained elsewhere (as part of aseparate read model) and requested from persistent actors duringtransaction validation. If you need further metrics in the future, newread models can be added and included into the validation workflowinitiated by a persistent actor.

I could keep millions of these inside of a single actor.

A few problems come up though:
Replaying this actor from events is very painful (millions possiblyhundreds of millions of events and they must be processes serially)solution->snapshots?Snapshots have all the same versioning issues people are used to withkeeping state around. What happens when the state I am keeping changessay now I also need to keep avg+stddev of transaction amount or wefound a bug in how we were maintaining the loyalty score (back to #1)this will invalidate my snapshot

See above, there's no need to keep all of that inside the persistentactor for strict read consistency. Allowing eventual consistency duringcommand validation where possible not only makes the validation processmore flexible (by just including new read models if required) but alsoreduces snapshot migration efforts (by simplifying the state structureinside persistent actors).

Furthermore, ensuring strict consistency for persistent actor staterequires usage of persist() instead of persistAsync() which reducesthroughput at least by a factor of 10. That may again be in conflictwith write throughput requirements.

To conclude, I think there are use cases where a 1:1 approach makessense but this shouldn't be a general recommendation IMO. It reallydepends on the specific functional and non-functional requirements forfinding the best compromise.

(requiring a full replay or else you run into another whole series ofhokey problems trying to do "from here forward" type things (imagine anew feature that relies on a 6 month moving average)

On Tue, Aug 26, 2014 at 2:15 PM, Martin Krasser<[email protected] <mailto:[email protected]>> wrote:



    On 26.08.14 20:12, Greg Young wrote:

    In particular I am interested in the associated state thats
    needed, I can see keeping it in a single actor but this does not
    turn out well at all for most production systems in particular as
    changes happen over time.


    I don't get your point. Please elaborate.



    On Tue, Aug 26, 2014 at 2:08 PM, Martin Krasser
    <[email protected] <mailto:[email protected]>> wrote:

        See my eventsourced example(s), that I published 1-2 years
        ago, others are closed source


        On 26.08.14 20:06, Greg Young wrote:

        Love to see an example

        On Tuesday, August 26, 2014, Martin Krasser
        <[email protected] <mailto:[email protected]>>
        wrote:


            On 26.08.14 19:56, Greg Young wrote:

            I'm curious how you would model say bank accounts with
            only a few hundred actors can you go into a bit of detail


            persistent-actor : bank-account = 1:n (instead of 1:1)


            On Tuesday, August 26, 2014, Martin Krasser
            <[email protected]> wrote:


                On 26.08.14 16:44, Andrzej Dębski wrote:

                My mind must have filtered out the possibility of
                making snapshots using Views - thanks.

                About partitions: I suspected as much. The only
                thing that I am wondering now is: if it is
                possible to dynamically create partitions in
                Kafka? AFAIK the number of partitions is set
                during topic creation (be it programmatically
                using API or CLI tools) and there is CLI tool you
                can use to modify existing topic:
                
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool.
                To keep the invariant  " PersistentActor is the
                only writer to a partitioned journal topic" you
                would have to create those partitions dynamically
                (usually you don't know up front how many
                PersistentActors your system will have) on
                per-PersistentActor basis.


                You're right. If you want to keep all data in Kafka
                without ever deleting them, you'd need to add
                partitions dynamically (which is currently possible
                with APIs that back the CLI). On the other hand,
                using Kafka this way is the wrong approach IMO. If
                you really need to keep the full event history,
                keep old events on HDFS or wherever and only the
                more recent ones in Kafka (where a full replay must
                first read from HDFS and then from Kafka) or use a
                journal plugin that is explicitly designed for
                long-term event storage.

                The main reason why I developed the Kafka plugin
                was to integrate my Akka applications in unified
                log processing architectures as descibed in Jay
                Kreps' excellent article
                
<http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>.
                Also mentioned in this article is a snapshotting
                strategy that fits typical retention times in Kafka.


                On the other hand maybe you are assuming that each
                actor is writing to different topic


                yes, and the Kafka plugin is currently implemented
                that way.

                - but I think this solution is not viable because
                information about topics is limited by ZK and
                other factors:
                
http://grokbase.com/t/kafka/users/133v60ng6v/limit-on-number-of-kafka-topic.


                A more in-depth discussion about these limitations
                is given at
                
http://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka
                with a detailed comment from Jay. I'd say that if
                you designed your application to run more than a
                few hundred persistent actors, then the Kafka
                plugin is the probably wrong choice. I tend to
                design my applications to have only a small number
                of persistent actors (which is in contrast to many
                other discussions on akka-user) which makes the
                Kafka plugin a good candidate.

                To recap, the Kafka plugin is a reasonable choice if

                - frequent snapshotting is done by persistent
                actors (every day or so)
                - you don't have more than a few hundred persistent
                actors and
                - your application is a component of a unified log
                processing architecture (backed by Kafka)

                The most interesting next Kafka plugin feature for
                me to develop is an HDFS integration for long-term
                event storage (and full event history replay). WDYT?


                W dniu wtorek, 26 sierpnia 2014 15:28:47 UTC+2
                użytkownik Martin Krasser napisał:

                    Hi Andrzej,

                    On 26.08.14 09:15, Andrzej Dębski wrote:

                    Hello

                    Lately I have been reading about a
                    possibility of using Apache Kafka as
                    journal/snapshot store for akka-persistence.

                    I am aware of the plugin created by Martin
                    Krasser:
                    https://github.com/krasserm/akka-persistence-kafka/
                    and also I read other topic about Kafka as
                    journal
                    
https://groups.google.com/forum/#!searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ
                    
<https://groups.google.com/forum/#%21searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ>.

                    In both sources I linked two ideas were
                    presented:

                    1. Set log retention to 7 days, take
                    snapshots every 3 days (example values)
                    2. Set log retention to unlimited.

                    Here is the first question: in first case
                    wouldn't it mean that persistent views would
                    receive skewed view of the PersistentActor
                    state (only events from 7 days) - is it
                    really viable solution? As far as I know
                    PersistentView can only receive events - it
                    can't receive snapshots from corresponding
                    PersistentActor (which is good in general case).


                    PersistentViews can create their own snapshots
                    which are isolated from the corresponding
                    PersistentActor's snapshots.


                    Second question (more directed to Martin): in
                    the thread I linked you wrote:

                         I don't go into Kafka partitioning
                        details here but it is possible to
                        implement the journal driver in a way
                        that both a single persistent actor's
                        data are partitioned *and* kept in order


                     I am very interested in this idea. AFAIK it
                    is not yet implemented in current plugin but
                    I was wondering if you could share high level
                    idea how would you achieve that (one
                    persistent actor, multiple partitions,
                    ordering ensured)?


                    The idea is to

                    - first write events 1 to n to partition 1
                    - then write events n+1 to 2n to partition 2
                    - then write events 2n+1 to 3n to partition 3
                    - ... and so on

                    This works because a PersistentActor is the
                    only writer to a partitioned journal topic.
                    During replay, you first replay partition 1,
                    then partition 2 and so on. This should be
                    rather easy to implement in the Kafka journal,
                    just didn't have time so far; pull requests
                    are welcome :) Btw, the Cassandra journal
                    <https://github.com/krasserm/akka-persistence-cassandra>
                    follows the very same strategy for scaling
                    with data volume (by using different partition
                    keys).

                    Cheers,
                    Martin