Thanks for quick responses.

Jun: KAFKA-50 looks pretty interesting.  I am going to go through it in
more detail again tonight.  This feature not being in yet is not going to
block me from getting going but will for wide spread use I think.  Is there
any opportunity to help contribute to this?  Maybe start with something
else (smaller) might be better to get my feet wet?  I am going to start to
cozy up to the code this weekend.... right now "Test Starting:
testProduceAndMultiFetch(kafka.javaapi.integration.PrimitiveApiTest)" keeps
hanging but maybe it resource issue on my machine (will try it on another
machine and if still and issue will send to dev)

Neha: I think as you said "just copy over the topic directories and start
the Kafka cluster" for now will be a sufficient approach if/when a server
dies since it sounds like another broker (Y) would get the requests when
broker (X) dies (this is an assumption and easy enough for me to test).
 For the "at least" delivery guarantee I am assuming you are asking what we
would do if we get another event that maybe we already got because of a
failure?  We actually deal with this type of thing a lot already (mostly
from our mobile devices).   It depends on the data that is flowing (in this
paradigm I guess it would be the topic) and if N errors occurred within the
last T seconds each processor has logic what todo (even some a
system.exit())... with Kafka I would hope we would know when this happens
meaning we know an error has occurred and we might for this one row get it
again (in which case we could keep it somewhere resident to validate for Y
time period).

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

On Fri, Nov 4, 2011 at 12:41 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote:

> >> i mean, the log segments
> will always be in a consistent state on disk, right?
>
> Yes. You can just copy over the topic directories and start the Kafka
> cluster
>
> Thanks,
> Neha
>
> On Fri, Nov 4, 2011 at 9:35 AM, Tim Lossen <t...@lossen.de> wrote:
> > ok, but apart from the possibility of data loss, rsync
> > in principle should work fine? i mean, the log segments
> > will always be in a consistent state on disk, right?
> >
> >
> > On 2011-11-04, at 17:18 , Neha Narkhede wrote:
> >
> >>>> for redundancy, we were planning to simply rsync the kafka
> >> message logs to a second machine periodically. or do you
> >> see any obvious problems with that?
> >>
> >> Rsync approach has several problems and could be a lossy solution. We
> >> moved away from that by replacing a legacy system with Kafka.
> >> We recommend you setup your redundant cluster using the mirroring
> >> approach, which is much more reliable and real-time than rsync.
> >>
> >> I think 0.6 has a stripped down version of mirroring, where you cannot
> >> control the mirroring for specific topics.
> >>
> >> Thanks,
> >> Neha
> >>
> >> On Fri, Nov 4, 2011 at 9:12 AM, Tim Lossen <t...@lossen.de> wrote:
> >>>
> >>> interesting. is this already available in 0.6?
> >>>
> >>> for redundancy, we were planning to simply rsync the kafka
> >>> message logs to a second machine periodically. or do you
> >>> see any obvious problems with that?
> >>>
> >>> cheers
> >>> tim
> >>>
> >>> On 2011-11-04, at 17:07 , Neha Narkhede wrote:
> >>>> We have a mirroring feature where you can setup 2 clusters, one to be
> a
> >>>> mirror of the other. At LinkedIn, we have a production Kafka cluster,
> and
> >>>> an analytics Kafka cluster that mirrors the production one in real
> time. We
> >>>> still haven't updated the documentation to describe this in detail.
> >>>
> >>> --
> >>> http://tim.lossen.de
> >>>
> >>>
> >>>
> >
> > --
> > http://tim.lossen.de
> >
> >
> >
> >
>

Reply via email to