Re: Zookeeper

Taylor Gautier Mon, 07 Nov 2011 08:30:55 -0800

Right now, we do not use ZK either - we have both producer and consumer
side sharding that takes care of sending messages to topics on the right
kafka instance.  Each kafka instance we deploy is a complete silo and has
no knowledge of other kafka instances.  Since our initial use case is
somewhat outside the envelope of what Kafka was built for, we felt this was
necessary - basically we have a very large # of topics with low throughput,
while the primary use case for Kafka as I understand it is a low # of
topics with high throughput.


Eventually we will use Kafka for both kinds of use cases.

I have probably mentioned it before on the list, but one thing we haven't
had a chance to look into carefully is whether the user provided
partitioning scheme can implement what we are trying to do, basically send
messages to a given broker based on topics so the topics are spread across
the cluster, allowing us to increase the total # of topics we can support.


On Mon, Nov 7, 2011 at 8:25 AM, Jun Rao <jun...@gmail.com> wrote:

> Hi, Tim,
>
> Thanks for sharing this. As part of the replication work (KAFKA-50),
> partitions will become logical and their physical locations are registered
> in ZK. This will make it difficult to use Kafka without ZK. Overall, I
> think that simplifies the client. However, if you have any concerns, please
> comment in the mailing list or the jira.
>
> Jun
>
> On Mon, Nov 7, 2011 at 12:27 AM, Tim Lossen <t...@lossen.de> wrote:
>
> > sure, we are not in production yet, so things might still
> > change, but our current setup is as follows:
> >
> > - no zookeeper
> > - single kafka broker
> > - second kafka broker as standby
> > - logs are rsynced to standy every 5 minutes
> > - topics not (yet) partitioned
> > - multithreaded jruby consumer
> > - each thread with separate kafka client instance
> >
> > cheers
> > tim
> >
> >
> > On 2011-11-06, at 18:05 , Mark wrote:
> >
> > > Tim,
> > >
> > > Would you mind explaining how you use Kafka? Basically the general
> > overview of the messages/events you are capturing and how you go about
> > processing them. We will also be using kafka-rb so I'm particularly
> > interested in how others are using it.
> > >
> > > - M
> > >
> > > On 11/5/11 11:49 PM, Tim Lossen wrote:
> > >> we are using kafka entirely without zookeeper, and it is working
> > >> fine so far: single kafka broker, ruby consumers without coordination.
> > >>
> > >> tim
> > >>
> > >>
> > >> On 2011-11-05, at 22:03 , Mark wrote:
> > >>
> > >>> Ok, so no matter what ZooKeeper is still required when using Kafka.
> > One just has the option to either loadbalance producer =>  broker
> > connections via ZooKeeper or a Loadbalancer.
> > >>>
> > >>> Is that correct? If so, I think I finally got it :)
> > >>>
> > >>> On 11/5/11 1:29 PM, Jay Kreps wrote:
> > >>>> It is also worth mentioning that this is just for producers,
> consumers
> > >>>> always use zookeeper for load balancing and co-ordination. Logically
> > this
> > >>>> makes sense--partitioning production is trivial if you don't care
> > about
> > >>>> semantics of key=>partition assignment, but partitioning consumption
> > is
> > >>>> more complex because you need to divide up the partitions amongst
> the
> > set
> > >>>> of all consumers exactly.
> > >>>>
> > >>>> -jay
> > >>>>
> > >>>> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<jay.kr...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> The motivation here is is that literally every production process
> at
> > >>>>> LinkedIn sends messages to Kafka as part of either user tracking or
> > >>>>> operational monitoring or both. We are wary of adding that many zk
> > >>>>> connections and watches, so we run this first tier through a simple
> > L2 load
> > >>>>> balancer that just randomly balances connections over brokers. The
> > good
> > >>>>> part about this is that we can do zookeeper upgrades without
> > redeploying
> > >>>>> all the production apps to upgrade their zk jar.
> > >>>>>
> > >>>>> As Neha says, the zk producer is used for key-based partitioning by
> > the
> > >>>>> smaller number of producers who need that.
> > >>>>>
> > >>>>> -Jay
> > >>>>>
> > >>>>>
> > >>>>> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede<
> > neha.narkh...@gmail.com>wrote:
> > >>>>>
> > >>>>>> Mark,
> > >>>>>>
> > >>>>>> Most publishers at LinkedIn use a hardware load balancer approach.
> > >>>>>> These are configured to do a TCP healthcheck that monitors if the
> > >>>>>> kafka port on a broker is working. If it is, then requests are
> > >>>>>> forwarded to the broker. Some publishers though are using the
> > software
> > >>>>>> load balancer based on zookeeper. Those applications want to do
> some
> > >>>>>> key based partitioning of data.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Neha
> > >>>>>>
> > >>>>>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<static.void....@gmail.com>
> > wrote:
> > >>>>>>> Sorry but I'm a bit confused now. So at LinkedIn you use a
> > loadbalancer
> > >>>>>>> instead of ZooKeeper or do you use it in conjunction with
> > ZooKeeper?
> > >>>>>>>
> > >>>>>>> Thanks
> > >>>>>>>
> > >>>>>>> On 11/4/11 7:09 PM, Jun Rao wrote:
> > >>>>>>>> broker.list is used in the producer property file. One caveat is
> > that
> > >>>>>> the
> > >>>>>>>> broker.list approach doesn't do healthcheck. Which means that
> if a
> > >>>>>> broker
> > >>>>>>>> goes down, the client could still try to send messages to it. At
> > >>>>>> LinkedIn,
> > >>>>>>>> we rely on a load balancer to do healthcheck for us. The
> zk-based
> > >>>>>>>> producer,
> > >>>>>>>> on the other hand, does health check.
> > >>>>>>>>
> > >>>>>>>> You can find out more details about our ZK design in our design
> > page in
> > >>>>>>>> the
> > >>>>>>>> website or the paper in
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
> > >>>>>> .
> > >>>>>>>> Jun
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 4, 2011 at 6:52 PM, Mark<static.void....@gmail.com>
> > >>>>>>  wrote:
> > >>>>>>>>> I just noticed that there is an option to not use Zookeeper and
> > >>>>>> instead
> > >>>>>>>>> one can use a static list of brokers (#9 on
> > >>>>>>>>> http://incubator.apache.org/**
> > >>>>>>>>>
> > >>>>>>>>> kafka/quickstart.html<
> > >>>>>> http://incubator.apache.org/kafka/quickstart.html>).
> > >>>>>>>>> Do i put this list in server.properties?
> > >>>>>>>>>
> > >>>>>>>>> It doesn't seem like you save much either way as you have to
> > either
> > >>>>>>>>>  a) list out all the nodes in the zookeeper quorum in
> > >>>>>>>>> zookeeper.properties
> > >>>>>>>>>  b) list out static brokers in  server.properties.
> > >>>>>>>>>
> > >>>>>>>>> What are the benefits of using ZooKeeper over a static list?
>  Can
> > >>>>>> someone
> > >>>>>>>>> also explain how Kafka uses ZooKeeper?
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >> --
> > >> http://tim.lossen.de
> > >>
> > >>
> > >>
> >
> > --
> > http://tim.lossen.de
> >
> >
> >
> >
>

Re: Zookeeper

Reply via email to