Re: Our scenario and couple of questions

Neha Narkhede Tue, 16 Oct 2012 10:46:11 -0700

>> *Question 1*: If each broker has one topic and one partition, if i want to
implement a partitioned producer (in php), I still have 8 partitions in
total, correct ?


Correct

>> *Question 2*: In future I may have mutliple event tracking clusters which I
want mirrored onto a single topic in the central trucker, is this kind of
mirroring possible with 0.7.x ?

This is available in 0.7.1 onwards

>> *Question 3*: If i want the low-level php producer to batch & zip 10
messages like the async scala/java producer does, all i have to do is to
send a message that is a message set containing all the 10 messages,
correct ?

Yes, provided you conform with the format of a compressed message -
https://cwiki.apache.org/confluence/display/KAFKA/Compression

>> *Question 4*: This system is quite likely to go into production in next
weeks, and I prefer staying with 0.7.x because it's simpler for non-java
clients but would you advice me to build on 0.8.x and why ?

Recommend staying on 0.7.x since it is stable. If your requirements
include message replication, durability and guaranteed delivery,
you might want to wait until 0.8 is released. The wire protocol has
changed considerably in 0.8.

Thanks,
Neha

On Tue, Oct 16, 2012 at 10:34 AM, Michal Haris
<michal.ha...@visualdna.com> wrote:
> Hi,
>
> Hi everyone*,
>
> Our current situtation (without kafka)*
>
> - we have at the moment 8 event tracker servers that in total are capable
> of handling 8000 http events / second but a normal day peak throughput is
> about 1250 messages / second.
> - messages are basically http events enriched by various apache mods and
> trasnformations eventually written into log files
> - each event is cca 0.5kb when packed as json
> - these message logs are compressed and every 5 minutes shipped into S3
> where they are used by hive and other hadoop jobs
> - pretty standard
> *
> My plan is to introduce a kafka system on top the existing offline
> log-processing. *
>
> I have a simulated event stream and have written a hadoop job similar to
> the etl consumer in the trunk except i keep the offsets in the zookeeper
> and the output files are partitioned by date directory.
> In the first phase I am going to install kafka broker on each of the 8
> tracker servers and simply tail | php producer.php on each of the 8 tracker
> servers and then have a PHP code publishing into a local broker node under
> a single topic, so in total there will be a cluster of 8 kafka server with
> a 3 or 5 zookeeper ensemble interlaced on the same hardware. This topic is
> going to be mirrored into a central kafka cluster where the hadoop-loader
> job will run every 30 min or so.
>
> *Question 1*: If each broker has one topic and one partition, if i want to
> implement a partitioned producer (in php), I still have 8 partitions in
> total, correct ?
> *Question 2*: In future I may have mutliple event tracking clusters which I
> want mirrored onto a single topic in the central trucker, is this kind of
> mirroring possible with 0.7.x ?
> *Question 3*: If i want the low-level php producer to batch & zip 10
> messages like the async scala/java producer does, all i have to do is to
> send a message that is a message set containing all the 10 messages,
> correct ?
> *Question 4*: This system is quite likely to go into production in next
> weeks, and I prefer staying with 0.7.x because it's simpler for non-java
> clients but would you advice me to build on 0.8.x and why ?
>
>
> Thanks a lot
> --
> Michal Haris
> Software Engineer
>
> VisualDNA | 7 Moor Street, London, W1D 5NB
> www.visualdna.com | t: +44 (0) 207 734 7033

Re: Our scenario and couple of questions

Reply via email to