Great, thanks a lot! On 16 October 2012 18:45, Neha Narkhede <neha.narkh...@gmail.com> wrote:
> >> *Question 1*: If each broker has one topic and one partition, if i want > to > implement a partitioned producer (in php), I still have 8 partitions in > total, correct ? > > Correct > > >> *Question 2*: In future I may have mutliple event tracking clusters > which I > want mirrored onto a single topic in the central trucker, is this kind of > mirroring possible with 0.7.x ? > > This is available in 0.7.1 onwards > > >> *Question 3*: If i want the low-level php producer to batch & zip 10 > messages like the async scala/java producer does, all i have to do is to > send a message that is a message set containing all the 10 messages, > correct ? > > Yes, provided you conform with the format of a compressed message - > https://cwiki.apache.org/confluence/display/KAFKA/Compression > > >> *Question 4*: This system is quite likely to go into production in next > weeks, and I prefer staying with 0.7.x because it's simpler for non-java > clients but would you advice me to build on 0.8.x and why ? > > Recommend staying on 0.7.x since it is stable. If your requirements > include message replication, durability and guaranteed delivery, > you might want to wait until 0.8 is released. The wire protocol has > changed considerably in 0.8. > > Thanks, > Neha > > On Tue, Oct 16, 2012 at 10:34 AM, Michal Haris > <michal.ha...@visualdna.com> wrote: > > Hi, > > > > Hi everyone*, > > > > Our current situtation (without kafka)* > > > > - we have at the moment 8 event tracker servers that in total are capable > > of handling 8000 http events / second but a normal day peak throughput is > > about 1250 messages / second. > > - messages are basically http events enriched by various apache mods and > > trasnformations eventually written into log files > > - each event is cca 0.5kb when packed as json > > - these message logs are compressed and every 5 minutes shipped into S3 > > where they are used by hive and other hadoop jobs > > - pretty standard > > * > > My plan is to introduce a kafka system on top the existing offline > > log-processing. * > > > > I have a simulated event stream and have written a hadoop job similar to > > the etl consumer in the trunk except i keep the offsets in the zookeeper > > and the output files are partitioned by date directory. > > In the first phase I am going to install kafka broker on each of the 8 > > tracker servers and simply tail | php producer.php on each of the 8 > tracker > > servers and then have a PHP code publishing into a local broker node > under > > a single topic, so in total there will be a cluster of 8 kafka server > with > > a 3 or 5 zookeeper ensemble interlaced on the same hardware. This topic > is > > going to be mirrored into a central kafka cluster where the hadoop-loader > > job will run every 30 min or so. > > > > *Question 1*: If each broker has one topic and one partition, if i want > to > > implement a partitioned producer (in php), I still have 8 partitions in > > total, correct ? > > *Question 2*: In future I may have mutliple event tracking clusters > which I > > want mirrored onto a single topic in the central trucker, is this kind of > > mirroring possible with 0.7.x ? > > *Question 3*: If i want the low-level php producer to batch & zip 10 > > messages like the async scala/java producer does, all i have to do is to > > send a message that is a message set containing all the 10 messages, > > correct ? > > *Question 4*: This system is quite likely to go into production in next > > weeks, and I prefer staying with 0.7.x because it's simpler for non-java > > clients but would you advice me to build on 0.8.x and why ? > > > > > > Thanks a lot > > -- > > Michal Haris > > Software Engineer > > > > VisualDNA | 7 Moor Street, London, W1D 5NB > > www.visualdna.com | t: +44 (0) 207 734 7033 > -- Michal Haris Software Engineer VisualDNA | 7 Moor Street, London, W1D 5NB www.visualdna.com | t: +44 (0) 207 734 7033