Re: [akka-user] Apache Kafka as journal - retention times/PersistentView and partitions

Martin Krasser Tue, 26 Aug 2014 11:15:46 -0700


On 26.08.14 20:12, Greg Young wrote:

In particular I am interested in the associated state thats needed, Ican see keeping it in a single actor but this does not turn out wellat all for most production systems in particular as changes happenover time.


I don't get your point. Please elaborate.

On Tue, Aug 26, 2014 at 2:08 PM, Martin Krasser<[email protected] <mailto:[email protected]>> wrote:


    See my eventsourced example(s), that I published 1-2 years ago,
    others are closed source


    On 26.08.14 20:06, Greg Young wrote:

    Love to see an example

    On Tuesday, August 26, 2014, Martin Krasser
    <[email protected] <mailto:[email protected]>> wrote:


        On 26.08.14 19:56, Greg Young wrote:

        I'm curious how you would model say bank accounts with only
        a few hundred actors can you go into a bit of detail


        persistent-actor : bank-account = 1:n (instead of 1:1)


        On Tuesday, August 26, 2014, Martin Krasser
        <[email protected]> wrote:


            On 26.08.14 16:44, Andrzej Dębski wrote:

            My mind must have filtered out the possibility of
            making snapshots using Views - thanks.

            About partitions: I suspected as much. The only thing
            that I am wondering now is: if it is possible to
            dynamically create partitions in Kafka? AFAIK the
            number of partitions is set during topic creation (be
            it programmatically using API or CLI tools) and there
            is CLI tool you can use to modify existing topic:
            
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool.
            To keep the invariant  " PersistentActor is the only
            writer to a partitioned journal topic" you would have
            to create those partitions dynamically (usually you
            don't know up front how many PersistentActors your
            system will have) on per-PersistentActor basis.


            You're right. If you want to keep all data in Kafka
            without ever deleting them, you'd need to add partitions
            dynamically (which is currently possible with APIs that
            back the CLI). On the other hand, using Kafka this way
            is the wrong approach IMO. If you really need to keep
            the full event history, keep old events on HDFS or
            wherever and only the more recent ones in Kafka (where a
            full replay must first read from HDFS and then from
            Kafka) or use a journal plugin that is explicitly
            designed for long-term event storage.

            The main reason why I developed the Kafka plugin was to
            integrate my Akka applications in unified log processing
            architectures as descibed in Jay Kreps' excellent
            article
            
<http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>.
            Also mentioned in this article is a snapshotting
            strategy that fits typical retention times in Kafka.


            On the other hand maybe you are assuming that each
            actor is writing to different topic


            yes, and the Kafka plugin is currently implemented that way.

            - but I think this solution is not viable because
            information about topics is limited by ZK and other
            factors:
            
http://grokbase.com/t/kafka/users/133v60ng6v/limit-on-number-of-kafka-topic.


            A more in-depth discussion about these limitations is
            given at
            http://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka
            with a detailed comment from Jay. I'd say that if you
            designed your application to run more than a few hundred
            persistent actors, then the Kafka plugin is the probably
            wrong choice. I tend to design my applications to have
            only a small number of persistent actors (which is in
            contrast to many other discussions on akka-user) which
            makes the Kafka plugin a good candidate.

            To recap, the Kafka plugin is a reasonable choice if

            - frequent snapshotting is done by persistent actors
            (every day or so)
            - you don't have more than a few hundred persistent
            actors and
            - your application is a component of a unified log
            processing architecture (backed by Kafka)

            The most interesting next Kafka plugin feature for me to
            develop is an HDFS integration for long-term event
            storage (and full event history replay). WDYT?


            W dniu wtorek, 26 sierpnia 2014 15:28:47 UTC+2
            użytkownik Martin Krasser napisał:

                Hi Andrzej,

                On 26.08.14 09:15, Andrzej Dębski wrote:

                Hello

                Lately I have been reading about a possibility of
                using Apache Kafka as journal/snapshot store for
                akka-persistence.

                I am aware of the plugin created by Martin
                Krasser:
                https://github.com/krasserm/akka-persistence-kafka/ and
                also I read other topic about Kafka as journal
                
https://groups.google.com/forum/#!searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ
                
<https://groups.google.com/forum/#%21searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ>.

                In both sources I linked two ideas were presented:

                1. Set log retention to 7 days, take snapshots
                every 3 days (example values)
                2. Set log retention to unlimited.

                Here is the first question: in first case wouldn't
                it mean that persistent views would receive skewed
                view of the PersistentActor state (only events
                from 7 days) - is it really viable solution? As
                far as I know PersistentView can only receive
                events - it can't receive snapshots from
                corresponding PersistentActor (which is good in
                general case).


                PersistentViews can create their own snapshots
                which are isolated from the corresponding
                PersistentActor's snapshots.


                Second question (more directed to Martin): in the
                thread I linked you wrote:

                     I don't go into Kafka partitioning details
                    here but it is possible to implement the
                    journal driver in a way that both a single
                    persistent actor's data are partitioned *and*
                    kept in order


                 I am very interested in this idea. AFAIK it is
                not yet implemented in current plugin but I was
                wondering if you could share high level idea how
                would you achieve that (one persistent actor,
                multiple partitions, ordering ensured)?


                The idea is to

                - first write events 1 to n to partition 1
                - then write events n+1 to 2n to partition 2
                - then write events 2n+1 to 3n to partition 3
                - ... and so on

                This works because a PersistentActor is the only
                writer to a partitioned journal topic. During
                replay, you first replay partition 1, then
                partition 2 and so on. This should be rather easy
                to implement in the Kafka journal, just didn't have
                time so far; pull requests are welcome :) Btw, the
                Cassandra journal
                <https://github.com/krasserm/akka-persistence-cassandra>
                follows the very same strategy for scaling with
                data volume (by using different partition keys).

                Cheers,
                Martin

-->>>>>>>>>> Read the docs: http://akka.io/docs/

                >>>>>>>>>> Check the FAQ:
                http://doc.akka.io/docs/akka/current/additional/faq.html
                >>>>>>>>>> Search the archives:
                https://groups.google.com/group/akka-user
                ---
                You received this message because you are
                subscribed to the Google Groups "Akka User List"
                group.
                To unsubscribe from this group and stop receiving
                emails from it, send an email to
                [email protected].
                To post to this group, send email to
                [email protected].
                Visit this group at
                http://groups.google.com/group/akka-user.
                For more options, visit
                https://groups.google.com/d/optout.

--Martin Krasser


                blog:http://krasserm.blogspot.com
                code:http://github.com/krasserm
                twitter:http://twitter.com/mrt1nz

-->>>>>>>>>> Read the docs: http://akka.io/docs/

            >>>>>>>>>> Check the FAQ:
            http://doc.akka.io/docs/akka/current/additional/faq.html
            >>>>>>>>>> Search the archives:
            https://groups.google.com/group/akka-user
            ---
            You received this message because you are subscribed to
            the Google Groups "Akka User List" group.
            To unsubscribe from this group and stop receiving
            emails from it, send an email to
            [email protected].
            To post to this group, send email to
            [email protected].
            Visit this group at
            http://groups.google.com/group/akka-user.
            For more options, visit https://groups.google.com/d/optout.

--Martin Krasser


            blog:http://krasserm.blogspot.com
            code:http://github.com/krasserm
            twitter:http://twitter.com/mrt1nz

-->>>>>>>>>> Read the docs: http://akka.io/docs/

            >>>>>>>>>> Check the FAQ:
            http://doc.akka.io/docs/akka/current/additional/faq.html
            >>>>>>>>>> Search the archives:
            https://groups.google.com/group/akka-user
            ---
            You received this message because you are subscribed to
            a topic in the Google Groups "Akka User List" group.
            To unsubscribe from this topic, visit
            https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe.
            To unsubscribe from this group and all its topics, send
            an email to [email protected].
            To post to this group, send email to
            [email protected].
            Visit this group at
            http://groups.google.com/group/akka-user.
            For more options, visit https://groups.google.com/d/optout.

--Studying for the Turing test

-->>>>>>>>>> Read the docs: http://akka.io/docs/

        >>>>>>>>>> Check the FAQ:
        http://doc.akka.io/docs/akka/current/additional/faq.html
        >>>>>>>>>> Search the archives:
        https://groups.google.com/group/akka-user
        ---
        You received this message because you are subscribed to the
        Google Groups "Akka User List" group.
        To unsubscribe from this group and stop receiving emails
        from it, send an email to
        [email protected].
        To post to this group, send email to [email protected].
        Visit this group at http://groups.google.com/group/akka-user.
        For more options, visit https://groups.google.com/d/optout.

--Martin Krasser


        blog:http://krasserm.blogspot.com
        code:http://github.com/krasserm
        twitter:http://twitter.com/mrt1nz

-->>>>>>>>>> Read the docs: http://akka.io/docs/

        >>>>>>>>>> Check the FAQ:
        http://doc.akka.io/docs/akka/current/additional/faq.html
        >>>>>>>>>> Search the archives:
        https://groups.google.com/group/akka-user
        ---
        You received this message because you are subscribed to a
        topic in the Google Groups "Akka User List" group.
        To unsubscribe from this topic, visit
        https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe.
        To unsubscribe from this group and all its topics, send an
        email to [email protected].
        To post to this group, send email to [email protected].
        Visit this group at http://groups.google.com/group/akka-user.
        For more options, visit https://groups.google.com/d/optout.

--Studying for the Turing test

-->>>>>>>>>> Read the docs: http://akka.io/docs/

    >>>>>>>>>> Check the FAQ:
    http://doc.akka.io/docs/akka/current/additional/faq.html
    >>>>>>>>>> Search the archives:
    https://groups.google.com/group/akka-user
    ---
    You received this message because you are subscribed to the
    Google Groups "Akka User List" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.
    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at http://groups.google.com/group/akka-user.
    For more options, visit https://groups.google.com/d/optout.

--Martin Krasser


    blog:http://krasserm.blogspot.com
    code:http://github.com/krasserm
    twitter:http://twitter.com/mrt1nz

-->>>>>>>>>> Read the docs: http://akka.io/docs/

    >>>>>>>>>> Check the FAQ:
    http://doc.akka.io/docs/akka/current/additional/faq.html
    >>>>>>>>>> Search the archives:
    https://groups.google.com/group/akka-user
    ---
    You received this message because you are subscribed to a topic in
    the Google Groups "Akka User List" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe.
    To unsubscribe from this group and all its topics, send an email
    to [email protected]
    <mailto:[email protected]>.
    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at http://groups.google.com/group/akka-user.
    For more options, visit https://groups.google.com/d/optout.




--
Studying for the Turing test
--
>>>>>>>>>> Read the docs: http://akka.io/docs/

>>>>>>>>>> Check the FAQ:http://doc.akka.io/docs/akka/current/additional/faq.html

>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---

You received this message because you are subscribed to the GoogleGroups "Akka User List" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.

Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


--
Martin Krasser

blog:    http://krasserm.blogspot.com
code:    http://github.com/krasserm
twitter: http://twitter.com/mrt1nz

--

     Read the docs: http://akka.io/docs/
     Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
     Search the archives: https://groups.google.com/group/akka-user

---You received this message because you are subscribed to the Google Groups "Akka User List" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Apache Kafka as journal - retention times/PersistentView and partitions

Reply via email to