Felix, We use RAID too. One potential problem with RAID is that if you replace a broken disk, RAID goes into rebuild mode. This could significantly slow down I/O and make a broker not fully functional for new requests. Adding more mirrors doesn't alleviate this problem.
Jun On Wed, Jan 11, 2012 at 3:50 PM, Felix GV <fe...@mate1inc.com> wrote: > We've been thinking about this stuff a lot recently, at work. > > We've had some HD failures in our Kafka cluster. I don't know all the > details, but from what I heard, the HDs were mirrored in RAID but several > of them failed in a close time interval and the array did not have time to > fully rebuild itself, so we lost all of that data from the Kafka cluster. > Thankfully, the data was being consumed in near real time, so we only > really lost a small unconsumed window of data. > > Now, we're wondering what we could improve to prevent this scenario in the > future. I investigated Kafka mirroring but since it relies on consuming > data, the probability to lose the unconsumed window is still there. If we > had consumers that were more batch oriented (like hadoop) rather than > real-time, the benefits of a mirrored Kafka cluster would be greater, but > for our use cases, where data is consumed near real-time, we would still > lose as much data as before. Am I right? > > KAFKA-50, with sync replication would have solved our problem, but until > that's done, what are our options? > > I came to the conclusion that simply adding more mirrored copies in our > RAID arrays would be the most cost-effective way to give us both more > availability and more redundancy. This doesn't deal with the scenario where > a machine fails and becomes unavailable, in which case the data on it would > be temporarily unavailable but not lost (although, again, there could be a > small window of uncommited data). However, in terms of protection against > data loss from HD failures, it seems like the best option for now, no? > > It doesn't feel right to just throw more hardware at problems hehe... but I > guess sometimes it's the only choice :) ... > > Please tell me if that makes sense! > > -- > Felix > > > > On Wed, Jan 11, 2012 at 6:16 PM, Felix GV <fe...@mate1inc.com> wrote: > > > As I understand it, you cannot use a mirrored Kafka cluster as a hot > > fail-over. > > > > You could probably use it as a manual fail-over, but I don't know the > > complexity involved in doing that. > > > > Also, if your source cluster fails while producers were putting data into > > it, there will be an "unconsumed window" of data that is lost. This > > corresponds to the data that the embedded consumer in the mirrored > cluster > > did not have time to consume from the source cluster. > > > > All in all, the mirrored cluster is akin to asynchronous replication, > > without any hot fail-over capability. Thus, it provides data redundancy > > (outside of the unconsumed window described above) but no extra > > availability (unless you count manual interventions). > > > > KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other > > hand, will provide both asynchronous AND synchronous replication > (although > > the latter will incur a latency penalty) and will be able to use the > > replicas (data redundancy) as hot-fail overs. > > > > Depending on your personal definition of "highly reliable" (whether it > > includes data redundancy and/or availability), I think that should > probably > > answer your question...? > > > > To all the Kafka experts: please correct me if the above explanations are > > incorrect :) ! > > > > -- > > Felix > > > > > > > > > > On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <jun...@gmail.com> wrote: > > > >> It's just that the mirroring logic depends on ZK to be available most of > >> the time. > >> > >> Jun > >> > >> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccaro...@gmail.com > >> >wrote: > >> > >> > I see. But if I used that configuration and then did the mirroring > you > >> > suggested would that be enough, in your opinion, to be considered > highly > >> > reliable? > >> > > >> > Christian > >> > > >> > > >> > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote: > >> > > >> > >> For example, can I have one ZK instance and one broker on one > machine > >> > and > >> > > that is enough to define a ZK cluster and a Kafka Cluster? > >> > > > >> > > Yes, although you don't get the reliability of ZK now. > >> > > > >> > > Jun > >> > > > >> > > > >> > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo < > >> ccaro...@gmail.com > >> > >wrote: > >> > > > >> > >> Jun, > >> > >> > >> > >> I don't think I ask my question the right way. > >> > >> > >> > >> What I am trying to understand is what are the minimum constituent > >> parts > >> > >> of a kafka cluster? > >> > >> > >> > >> Based on your last email, I am now wondering what are the minimum > >> > >> constituent parts of a ZK cluster as well as a Kafka cluster? > >> > >> > >> > >> For example, can I have one ZK instance and one broker on one > machine > >> > and > >> > >> that is enough to define a ZK cluster and a Kafka Cluster? > >> > >> > >> > >> Thanks, > >> > >> Christian > >> > >> > >> > >> > >> > >> On Jan 11, 2012, at 1:50 PM, Jun Rao <jun...@gmail.com> wrote: > >> > >> > >> > >>> Chrsitan, > >> > >>> > >> > >>> A Kafka cluster containers a ZK cluster and a list of brokers. > When > >> a > >> > >>> consumer subscribes to a topic in a kafka cluster, it consumes > data > >> > >> stored > >> > >>> in all brokers in that cluster. > >> > >>> > >> > >>> Thanks, > >> > >>> > >> > >>> Jun > >> > >>> > >> > >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo < > >> > ccaro...@gmail.com > >> > >>> wrote: > >> > >>> > >> > >>>> Thank you Jun that is quite helpful. I have a question about > Kafka > >> > >>>> Clusters. What are the minimum number and types of services that > >> must > >> > >> be > >> > >>>> running to make up a Kafka Cluster? > >> > >>>> > >> > >>>> I ask this because the diagrams (in the Kafka Mirroring document) > >> > allude > >> > >>>> to a multiple broker environment, however, since each broker does > >> not > >> > >>>> appear to provide redundancy (as of today) to any of the other > >> brokers > >> > >> in a > >> > >>>> given zookeeper service, it seems like a Kafka Cluster is nothing > >> more > >> > >> than > >> > >>>> a grouping of a single zookeeper instance with a single Kafka > >> broker, > >> > is > >> > >>>> this the correct understanding? > >> > >>>> > >> > >>>> Thanks, > >> > >>>> Christian > >> > >>>> > >> > >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote: > >> > >>>> > >> > >>>>> With 0.7, you can set up inter-cluster replication ( > >> > >>>>> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring > >> ). > >> > >>>>> > >> > >>>>> For the future 0.8 release, we are working on intra-cluster > >> > replication > >> > >>>>> support and details can be found at > >> > >>>>> https://issues.apache.org/jira/browse/KAFKA-50 > >> > >>>>> > >> > >>>>> Thanks, > >> > >>>>> > >> > >>>>> Jun > >> > >>>>> > >> > >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo < > >> > ccaro...@gmail.com > >> > >>>>> wrote: > >> > >>>>> > >> > >>>>>> I am looking to implement Kafka in a production environment, > >> > however, > >> > >> I > >> > >>>>>> haven't found in documentation or examples that > >> > >>>>>> discuss how to build a redundant implementation. Is there any > >> > >>>>>> documentation out their (blogs, articles, etc.) that describes > >> > >>>>>> how we can implement such a system with Kafka 0.6 or 0.7. > >> > >>>>>> > >> > >>>>>> Also, is there a timeframe the community is shooting for, to > >> release > >> > >>>> 0.8 w/ > >> > >>>>>> replication? > >> > >>>>>> > >> > >>>>>> Thanks > >> > >>>>>> Christian > >> > >>>>>> > >> > >>>> > >> > >>>> > >> > >> > >> > > >> > > >> > > > > >