On Tuesday, August 26, 2014, Martin Krasser
<[email protected]> wrote:
On 26.08.14 16:44, Andrzej Dębski wrote:
My mind must have filtered out the possibility of
making snapshots using Views - thanks.
About partitions: I suspected as much. The only thing
that I am wondering now is: if it is possible to
dynamically create partitions in Kafka? AFAIK the
number of partitions is set during topic creation (be
it programmatically using API or CLI tools) and there
is CLI tool you can use to modify existing topic:
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool.
To keep the invariant " PersistentActor is the only
writer to a partitioned journal topic" you would have
to create those partitions dynamically (usually you
don't know up front how many PersistentActors your
system will have) on per-PersistentActor basis.
You're right. If you want to keep all data in Kafka
without ever deleting them, you'd need to add partitions
dynamically (which is currently possible with APIs that
back the CLI). On the other hand, using Kafka this way
is the wrong approach IMO. If you really need to keep
the full event history, keep old events on HDFS or
wherever and only the more recent ones in Kafka (where a
full replay must first read from HDFS and then from
Kafka) or use a journal plugin that is explicitly
designed for long-term event storage.
The main reason why I developed the Kafka plugin was to
integrate my Akka applications in unified log processing
architectures as descibed in Jay Kreps' excellent
article
<http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>.
Also mentioned in this article is a snapshotting
strategy that fits typical retention times in Kafka.
On the other hand maybe you are assuming that each
actor is writing to different topic
yes, and the Kafka plugin is currently implemented that way.
- but I think this solution is not viable because
information about topics is limited by ZK and other
factors:
http://grokbase.com/t/kafka/users/133v60ng6v/limit-on-number-of-kafka-topic.
A more in-depth discussion about these limitations is
given at
http://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka
with a detailed comment from Jay. I'd say that if you
designed your application to run more than a few hundred
persistent actors, then the Kafka plugin is the probably
wrong choice. I tend to design my applications to have
only a small number of persistent actors (which is in
contrast to many other discussions on akka-user) which
makes the Kafka plugin a good candidate.
To recap, the Kafka plugin is a reasonable choice if
- frequent snapshotting is done by persistent actors
(every day or so)
- you don't have more than a few hundred persistent
actors and
- your application is a component of a unified log
processing architecture (backed by Kafka)
The most interesting next Kafka plugin feature for me to
develop is an HDFS integration for long-term event
storage (and full event history replay). WDYT?
W dniu wtorek, 26 sierpnia 2014 15:28:47 UTC+2
użytkownik Martin Krasser napisał:
Hi Andrzej,
On 26.08.14 09:15, Andrzej Dębski wrote:
Hello
Lately I have been reading about a possibility of
using Apache Kafka as journal/snapshot store for
akka-persistence.
I am aware of the plugin created by Martin
Krasser:
https://github.com/krasserm/akka-persistence-kafka/ and
also I read other topic about Kafka as journal
https://groups.google.com/forum/#!searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ
<https://groups.google.com/forum/#%21searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ>.
In both sources I linked two ideas were presented:
1. Set log retention to 7 days, take snapshots
every 3 days (example values)
2. Set log retention to unlimited.
Here is the first question: in first case wouldn't
it mean that persistent views would receive skewed
view of the PersistentActor state (only events
from 7 days) - is it really viable solution? As
far as I know PersistentView can only receive
events - it can't receive snapshots from
corresponding PersistentActor (which is good in
general case).
PersistentViews can create their own snapshots
which are isolated from the corresponding
PersistentActor's snapshots.
Second question (more directed to Martin): in the
thread I linked you wrote:
I don't go into Kafka partitioning details
here but it is possible to implement the
journal driver in a way that both a single
persistent actor's data are partitioned *and*
kept in order
I am very interested in this idea. AFAIK it is
not yet implemented in current plugin but I was
wondering if you could share high level idea how
would you achieve that (one persistent actor,
multiple partitions, ordering ensured)?
The idea is to
- first write events 1 to n to partition 1
- then write events n+1 to 2n to partition 2
- then write events 2n+1 to 3n to partition 3
- ... and so on
This works because a PersistentActor is the only
writer to a partitioned journal topic. During
replay, you first replay partition 1, then
partition 2 and so on. This should be rather easy
to implement in the Kafka journal, just didn't have
time so far; pull requests are welcome :) Btw, the
Cassandra journal
<https://github.com/krasserm/akka-persistence-cassandra>
follows the very same strategy for scaling with
data volume (by using different partition keys).
Cheers,
Martin
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives:
https://groups.google.com/group/akka-user
---
You received this message because you are
subscribed to the Google Groups "Akka User List"
group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
[email protected].
To post to this group, send email to
[email protected].
Visit this group at
http://groups.google.com/group/akka-user.
For more options, visit
https://groups.google.com/d/optout.
--
Martin Krasser
blog:http://krasserm.blogspot.com
code:http://github.com/krasserm
twitter:http://twitter.com/mrt1nz
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives:
https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to
the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
[email protected].
To post to this group, send email to
[email protected].
Visit this group at
http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
Martin Krasser
blog:http://krasserm.blogspot.com
code:http://github.com/krasserm
twitter:http://twitter.com/mrt1nz
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives:
https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to
a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe.
To unsubscribe from this group and all its topics, send
an email to [email protected].
To post to this group, send email to
[email protected].
Visit this group at
http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
Studying for the Turing test
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives:
https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the
Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to
[email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.