Re: Replication questions

Felix GV Thu, 26 Apr 2012 14:03:58 -0700

Thanks Jun :)

--
Felix




On Thu, Apr 26, 2012 at 3:26 PM, Jun Rao <jun...@gmail.com> wrote:

> Some comments inlined below.
>
> Thanks,
>
> Jun
>
> On Thu, Apr 26, 2012 at 10:27 AM, Felix GV <fe...@mate1inc.com> wrote:
>
> > Cool :) Thanks for those insights :) !
> >
> > I changed the subject of the thread, in order not to derail the original
> > thread's subject...! I just want to recap to make sure I (and others)
> > understand all of this correctly :)
> >
> > So, if I understand correctly, with acks == [0,1] Kafka should provide a
> > latency that is similar to what we have now, but with the possibility of
> > losing a small window of unreplicated events in the case of an
> > unrecoverable hardware failure, and with acks > 1 (or acks == -1) there
> > will probably be a latency penalty but we will be completely protected
> from
> > (non-correlated) hardware failures, right?
> >
> > This is mostly true. The difference is that in 0.7, producer doesn't wait
> for a TCP response from broker. In 0.8, the producer always waits for a
> response from broker. How quickly the broker sends the response depends on
> acks. If acks is less than ideal, you may get the response faster, but have
> some risk of losing the data if there is broker failure.
>
>
> > Also, I guess the above assumptions are correct for a batch size of 1,
> and
> > that bigger batch sizes could also lead to small windows of unwritten
> data
> > in cases of failures, just like now...? Although, now that I think of
> it, I
> > guess the vulnerability of bigger batch sizes would, again, only come
> into
> > play in scenarios of unrecoverable correlated failures, since even if a
> > machine fails with some partially committed batch, there would be other
> > machines who received the same data (through replication) and would have
> > enough time to commit those batches...
> >
> > I want to add that if the producer itself dies, it could lose a batch of
> events.
>
>
> > Finally, I guess that replication (whatever the ack parameter is) will
> > affect the overall throughput capacity of the Kafka cluster, since every
> > node will now be writing its own data as well as the replicated data from
> > +/- 2 other nodes, right?
> >
> > --
> > Felix
> >
> >
> >
> > On Wed, Apr 25, 2012 at 6:32 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Short answer is yes, both async (acks=0 or 1) and sync replication
> > > (acks > 1) will be both be supported.
> > >
> > > -Jay
> > >
> > > On Wed, Apr 25, 2012 at 11:22 AM, Jun Rao <jun...@gmail.com> wrote:
> > > > Felix,
> > > >
> > > > Initially, we thought we could keep the option of not sending acks
> from
> > > the
> > > > broker to the producer. However, this seems hard since in the new
> wire
> > > > protocol, we need to send at least the error code to the producer
> > (e.g.,
> > > a
> > > > request is sent to the wrong broker or wrong partition).
> > > >
> > > > So, what we allow in the current design is the following. The
> producer
> > > can
> > > > specify the # of acks in the request. By default (acks = -1), the
> > broker
> > > > will wait for the message to be written to all replicas that are
> still
> > > > synced up with the leader before acking the producer. Otherwise (acks
> > > >=0),
> > > > the broker will ack the producer after the message is written to acks
> > > > replicas. Currently, acks=0 is treated the same as acks=1.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Apr 25, 2012 at 10:39 AM, Felix GV <fe...@mate1inc.com>
> wrote:
> > > >
> > > >> Just curious, but if I remember correctly from the time I read
> > KAFKA-50
> > > and
> > > >> the related JIRA issues, you guys plan to implement sync AND async
> > > >> replication, right?
> > > >>
> > > >> --
> > > >> Felix
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Apr 24, 2012 at 4:42 PM, Jay Kreps <jay.kr...@gmail.com>
> > wrote:
> > > >>
> > > >> > Right now we do sloppy failover. That is when a broker goes down
> > > >> > traffic is redirected to the remaining machines, but any
> unconsumed
> > > >> > messages are stuck on that server until it comes back, if it is
> > > >> > permanently gone the messages are lost. This is acceptable for us
> in
> > > >> > the near-term since our pipeline is pretty real-time so this
> window
> > > >> > between production and consumption is pretty small. The complete
> > > >> > solution is the intra-cluster replication in KAFA-50 which is
> coming
> > > >> > along fairly nicely now that we are working on it.
> > > >> >
> > > >> > -Jay
> > > >> >
> > > >> > On Tue, Apr 24, 2012 at 12:21 PM, Oliver Krohne
> > > >> > <oliver.kro...@googlemail.com> wrote:
> > > >> > > Hi,
> > > >> > >
> > > >> > > indeed I thought could be used as failover approach.
> > > >> > >
> > > >> > > We use raid for local redundancy but it does not protect us in
> > case
> > > of
> > > >> a
> > > >> > machine failure, so I am looking for a way to achieve a
> master/slave
> > > >> setup
> > > >> > until KAFKA-50 has been implemented.
> > > >> > >
> > > >> > > I think we can solve it for now by having multiple broker so
> that
> > > the
> > > >> > application can continue sending messages if one broker goes down.
> > My
> > > >> main
> > > >> > concern is to not introduce a new single point of failure which
> can
> > > stop
> > > >> > the application. However as some consumer are not developed by us
> > and
> > > it
> > > >> is
> > > >> > not clear how they store the offset in zookeeper we need to find
> out
> > > how
> > > >> we
> > > >> > can manage the consumer in case a broker will never return after a
> > > >> failure.
> > > >> > It will be acceptable to lose a couple of messages if a broker
> dies
> > > and
> > > >> the
> > > >> > consumers have not consumed all messages at the point of failure.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Oliver
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Am 23.04.2012 um 19:58 schrieb Jay Kreps:
> > > >> > >
> > > >> > >> I think the confusion comes from the fact that we are using
> > > mirroring
> > > >> to
> > > >> > >> handle geographic distribution not failover. If I understand
> > > correctly
> > > >> > what
> > > >> > >> Oliver is asking for is something to give fault tolerance not
> > > >> something
> > > >> > for
> > > >> > >> distribution. I don't think that is really what the mirroring
> > does
> > > out
> > > >> > of
> > > >> > >> the box, though technically i suppose you could just reset the
> > > offsets
> > > >> > and
> > > >> > >> point the consumer at the new cluster and have it start from
> > "now".
> > > >> > >>
> > > >> > >> I think it would be helpful to document our use case in the
> > > mirroring
> > > >> > docs
> > > >> > >> since this is not the first time someone has asked about this.
> > > >> > >>
> > > >> > >> -Jay
> > > >> > >>
> > > >> > >> On Mon, Apr 23, 2012 at 10:38 AM, Joel Koshy <
> > jjkosh...@gmail.com>
> > > >> > wrote:
> > > >> > >>
> > > >> > >>> Hi Oliver,
> > > >> > >>>
> > > >> > >>> I was reading the mirroring guide and I wonder if it is
> required
> > > that
> > > >> > the
> > > >> > >>>> mirror runs it's own zookeeper?
> > > >> > >>>>
> > > >> > >>>> We have a zookeeper cluster running which is used by
> different
> > > >> > >>>> applications, so can we use that zookeeper cluster for the
> > kafka
> > > >> > source
> > > >> > >>> and
> > > >> > >>>> kafka mirror?
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>> You could have a single zookeeper cluster and use different
> > > >> namespaces
> > > >> > for
> > > >> > >>> the source/target mirror. However, I don't think it is
> > > recommended to
> > > >> > use a
> > > >> > >>> remote zookeeper (if you have a cross-DC set up) since that
> > would
> > > >> > >>> potentially mean very high ZK latencies on one of your
> clusters.
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>> What is the procedure if the kafka source server fails to
> > switch
> > > the
> > > >> > >>>> applications to use the mirrored instance?
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>> I don't quite follow this question - can you clarify? The
> mirror
> > > >> > cluster is
> > > >> > >>> pretty much a separate instance. There is no built-in
> automatic
> > > >> > fail-over
> > > >> > >>> if your source cluster goes down.
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>> Are there any backup best practices if we would not use
> > > mirroring?
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>> You can use RAID arrays for (local) data redundancy. You may
> > also
> > > be
> > > >> > >>> interested in the (intra-DC) replication feature (KAFKA-50)
> that
> > > is
> > > >> > >>> currently being developed. I believe some folks on this list
> > have
> > > >> also
> > > >> > used
> > > >> > >>> plain rsync's as an alternative to mirroring.
> > > >> > >>>
> > > >> > >>> Thanks,
> > > >> > >>>
> > > >> > >>> Joel
> > > >> > >>>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Re: Replication questions

Reply via email to