Thanks for quick responses. Jun: KAFKA-50 looks pretty interesting. I am going to go through it in more detail again tonight. This feature not being in yet is not going to block me from getting going but will for wide spread use I think. Is there any opportunity to help contribute to this? Maybe start with something else (smaller) might be better to get my feet wet? I am going to start to cozy up to the code this weekend.... right now "Test Starting: testProduceAndMultiFetch(kafka.javaapi.integration.PrimitiveApiTest)" keeps hanging but maybe it resource issue on my machine (will try it on another machine and if still and issue will send to dev)
Neha: I think as you said "just copy over the topic directories and start the Kafka cluster" for now will be a sufficient approach if/when a server dies since it sounds like another broker (Y) would get the requests when broker (X) dies (this is an assumption and easy enough for me to test). For the "at least" delivery guarantee I am assuming you are asking what we would do if we get another event that maybe we already got because of a failure? We actually deal with this type of thing a lot already (mostly from our mobile devices). It depends on the data that is flowing (in this paradigm I guess it would be the topic) and if N errors occurred within the last T seconds each processor has logic what todo (even some a system.exit())... with Kafka I would hope we would know when this happens meaning we know an error has occurred and we might for this one row get it again (in which case we could keep it somewhere resident to validate for Y time period). /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> */ On Fri, Nov 4, 2011 at 12:41 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > >> i mean, the log segments > will always be in a consistent state on disk, right? > > Yes. You can just copy over the topic directories and start the Kafka > cluster > > Thanks, > Neha > > On Fri, Nov 4, 2011 at 9:35 AM, Tim Lossen <t...@lossen.de> wrote: > > ok, but apart from the possibility of data loss, rsync > > in principle should work fine? i mean, the log segments > > will always be in a consistent state on disk, right? > > > > > > On 2011-11-04, at 17:18 , Neha Narkhede wrote: > > > >>>> for redundancy, we were planning to simply rsync the kafka > >> message logs to a second machine periodically. or do you > >> see any obvious problems with that? > >> > >> Rsync approach has several problems and could be a lossy solution. We > >> moved away from that by replacing a legacy system with Kafka. > >> We recommend you setup your redundant cluster using the mirroring > >> approach, which is much more reliable and real-time than rsync. > >> > >> I think 0.6 has a stripped down version of mirroring, where you cannot > >> control the mirroring for specific topics. > >> > >> Thanks, > >> Neha > >> > >> On Fri, Nov 4, 2011 at 9:12 AM, Tim Lossen <t...@lossen.de> wrote: > >>> > >>> interesting. is this already available in 0.6? > >>> > >>> for redundancy, we were planning to simply rsync the kafka > >>> message logs to a second machine periodically. or do you > >>> see any obvious problems with that? > >>> > >>> cheers > >>> tim > >>> > >>> On 2011-11-04, at 17:07 , Neha Narkhede wrote: > >>>> We have a mirroring feature where you can setup 2 clusters, one to be > a > >>>> mirror of the other. At LinkedIn, we have a production Kafka cluster, > and > >>>> an analytics Kafka cluster that mirrors the production one in real > time. We > >>>> still haven't updated the documentation to describe this in detail. > >>> > >>> -- > >>> http://tim.lossen.de > >>> > >>> > >>> > > > > -- > > http://tim.lossen.de > > > > > > > > >