Oh, one more aspect of the problem: The event stream can be potentially split into multiple topics and I have an idea how and with what partitioning but since the mirroring doesn't obey the partitioning nor supports partitioner implementation I have a dilemma. Note that there will be other topics besides this event stream in the entire system but for now only this one is relevant:
- Is it fine to have a single topic and then have consumers process pointlessly many messages only to find the few they are interested in? - Or would it make more sense to have one topic for the sake of mirroring and then have a consumer and producer that republishes those messages into multiple sub-topics where messages would appear redundantly in several topics each with a different parititioner ? thanks for your help, Michal, On 17 October 2012 10:39, Michal Haris <michal.ha...@visualdna.com> wrote: > Great, thanks a lot! > > > On 16 October 2012 18:45, Neha Narkhede <neha.narkh...@gmail.com> wrote: > >> >> *Question 1*: If each broker has one topic and one partition, if i >> want to >> implement a partitioned producer (in php), I still have 8 partitions in >> total, correct ? >> >> Correct >> >> >> *Question 2*: In future I may have mutliple event tracking clusters >> which I >> want mirrored onto a single topic in the central trucker, is this kind of >> mirroring possible with 0.7.x ? >> >> This is available in 0.7.1 onwards >> >> >> *Question 3*: If i want the low-level php producer to batch & zip 10 >> messages like the async scala/java producer does, all i have to do is to >> send a message that is a message set containing all the 10 messages, >> correct ? >> >> Yes, provided you conform with the format of a compressed message - >> https://cwiki.apache.org/confluence/display/KAFKA/Compression >> >> >> *Question 4*: This system is quite likely to go into production in next >> weeks, and I prefer staying with 0.7.x because it's simpler for non-java >> clients but would you advice me to build on 0.8.x and why ? >> >> Recommend staying on 0.7.x since it is stable. If your requirements >> include message replication, durability and guaranteed delivery, >> you might want to wait until 0.8 is released. The wire protocol has >> changed considerably in 0.8. >> >> Thanks, >> Neha >> >> On Tue, Oct 16, 2012 at 10:34 AM, Michal Haris >> <michal.ha...@visualdna.com> wrote: >> > Hi, >> > >> > Hi everyone*, >> > >> > Our current situtation (without kafka)* >> > >> > - we have at the moment 8 event tracker servers that in total are >> capable >> > of handling 8000 http events / second but a normal day peak throughput >> is >> > about 1250 messages / second. >> > - messages are basically http events enriched by various apache mods and >> > trasnformations eventually written into log files >> > - each event is cca 0.5kb when packed as json >> > - these message logs are compressed and every 5 minutes shipped into S3 >> > where they are used by hive and other hadoop jobs >> > - pretty standard >> > * >> > My plan is to introduce a kafka system on top the existing offline >> > log-processing. * >> > >> > I have a simulated event stream and have written a hadoop job similar to >> > the etl consumer in the trunk except i keep the offsets in the zookeeper >> > and the output files are partitioned by date directory. >> > In the first phase I am going to install kafka broker on each of the 8 >> > tracker servers and simply tail | php producer.php on each of the 8 >> tracker >> > servers and then have a PHP code publishing into a local broker node >> under >> > a single topic, so in total there will be a cluster of 8 kafka server >> with >> > a 3 or 5 zookeeper ensemble interlaced on the same hardware. This topic >> is >> > going to be mirrored into a central kafka cluster where the >> hadoop-loader >> > job will run every 30 min or so. >> > >> > *Question 1*: If each broker has one topic and one partition, if i want >> to >> > implement a partitioned producer (in php), I still have 8 partitions in >> > total, correct ? >> > *Question 2*: In future I may have mutliple event tracking clusters >> which I >> > want mirrored onto a single topic in the central trucker, is this kind >> of >> > mirroring possible with 0.7.x ? >> > *Question 3*: If i want the low-level php producer to batch & zip 10 >> > messages like the async scala/java producer does, all i have to do is to >> > send a message that is a message set containing all the 10 messages, >> > correct ? >> > *Question 4*: This system is quite likely to go into production in next >> > weeks, and I prefer staying with 0.7.x because it's simpler for non-java >> > clients but would you advice me to build on 0.8.x and why ? >> > >> > >> > Thanks a lot >> > -- >> > Michal Haris >> > Software Engineer >> > >> > VisualDNA | 7 Moor Street, London, W1D 5NB >> > www.visualdna.com | t: +44 (0) 207 734 7033 >> > > > > -- > Michal Haris > Software Engineer > > VisualDNA | 7 Moor Street, London, W1D 5NB > www.visualdna.com | t: +44 (0) 207 734 7033 > > > -- Michal Haris Software Engineer VisualDNA | 7 Moor Street, London, W1D 5NB www.visualdna.com | t: +44 (0) 207 734 7033