For 1), roughly speaking, hosts in a ZK cluster replicate among themselves synchronously. So having multiple ZK hosts improves reliability. ZK tolerates k failures with 2k+1 hosts.
For 2), that's exactly how our ZK-based producer works. Thanks, Jun On Wed, Jan 11, 2012 at 3:35 PM, Christian Carollo <ccaro...@gmail.com>wrote: > This leads to two more questions… > > 1) Maybe I am not understanding what a ZK cluster typically looks like or > is made up of. If I have more than one ZK service/instance running on a > single node that doesn't sound like it is more reliable when there is a > server failure. > > On the other hand, if I have one ZK on one node and another on another > node, even as a hot standby via mirroring, that seems like a more reliable > solution. I think I must be missing something, am I? > > 2) Can the client producer interrogate the ZK service and determine if it > is available and/or if one or more brokers are available? And if so get > there connection information from ZK so that the producer can intelligently > send messages to the right brokers? If this is possible the client > producer could handle failure cases and either contact a different > (hot-standby) ZK or Broker? > > Thanks > Christian > > On Jan 11, 2012, at 3:16 PM, Felix GV wrote: > > > As I understand it, you cannot use a mirrored Kafka cluster as a hot > > fail-over. > > > > You could probably use it as a manual fail-over, but I don't know the > > complexity involved in doing that. > > > > Also, if your source cluster fails while producers were putting data into > > it, there will be an "unconsumed window" of data that is lost. This > > corresponds to the data that the embedded consumer in the mirrored > cluster > > did not have time to consume from the source cluster. > > > > All in all, the mirrored cluster is akin to asynchronous replication, > > without any hot fail-over capability. Thus, it provides data redundancy > > (outside of the unconsumed window described above) but no extra > > availability (unless you count manual interventions). > > > > KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other > > hand, will provide both asynchronous AND synchronous replication > (although > > the latter will incur a latency penalty) and will be able to use the > > replicas (data redundancy) as hot-fail overs. > > > > Depending on your personal definition of "highly reliable" (whether it > > includes data redundancy and/or availability), I think that should > probably > > answer your question...? > > > > To all the Kafka experts: please correct me if the above explanations are > > incorrect :) ! > > > > -- > > Felix > > > > > > > > On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <jun...@gmail.com> wrote: > > > >> It's just that the mirroring logic depends on ZK to be available most of > >> the time. > >> > >> Jun > >> > >> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccaro...@gmail.com > >>> wrote: > >> > >>> I see. But if I used that configuration and then did the mirroring you > >>> suggested would that be enough, in your opinion, to be considered > highly > >>> reliable? > >>> > >>> Christian > >>> > >>> > >>> On Jan 11, 2012, at 2:32 PM, Jun Rao wrote: > >>> > >>>>> For example, can I have one ZK instance and one broker on one machine > >>> and > >>>> that is enough to define a ZK cluster and a Kafka Cluster? > >>>> > >>>> Yes, although you don't get the reliability of ZK now. > >>>> > >>>> Jun > >>>> > >>>> > >>>> On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo < > ccaro...@gmail.com > >>>> wrote: > >>>> > >>>>> Jun, > >>>>> > >>>>> I don't think I ask my question the right way. > >>>>> > >>>>> What I am trying to understand is what are the minimum constituent > >> parts > >>>>> of a kafka cluster? > >>>>> > >>>>> Based on your last email, I am now wondering what are the minimum > >>>>> constituent parts of a ZK cluster as well as a Kafka cluster? > >>>>> > >>>>> For example, can I have one ZK instance and one broker on one machine > >>> and > >>>>> that is enough to define a ZK cluster and a Kafka Cluster? > >>>>> > >>>>> Thanks, > >>>>> Christian > >>>>> > >>>>> > >>>>> On Jan 11, 2012, at 1:50 PM, Jun Rao <jun...@gmail.com> wrote: > >>>>> > >>>>>> Chrsitan, > >>>>>> > >>>>>> A Kafka cluster containers a ZK cluster and a list of brokers. When > a > >>>>>> consumer subscribes to a topic in a kafka cluster, it consumes data > >>>>> stored > >>>>>> in all brokers in that cluster. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Jun > >>>>>> > >>>>>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo < > >>> ccaro...@gmail.com > >>>>>> wrote: > >>>>>> > >>>>>>> Thank you Jun that is quite helpful. I have a question about Kafka > >>>>>>> Clusters. What are the minimum number and types of services that > >> must > >>>>> be > >>>>>>> running to make up a Kafka Cluster? > >>>>>>> > >>>>>>> I ask this because the diagrams (in the Kafka Mirroring document) > >>> allude > >>>>>>> to a multiple broker environment, however, since each broker does > >> not > >>>>>>> appear to provide redundancy (as of today) to any of the other > >> brokers > >>>>> in a > >>>>>>> given zookeeper service, it seems like a Kafka Cluster is nothing > >> more > >>>>> than > >>>>>>> a grouping of a single zookeeper instance with a single Kafka > >> broker, > >>> is > >>>>>>> this the correct understanding? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Christian > >>>>>>> > >>>>>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote: > >>>>>>> > >>>>>>>> With 0.7, you can set up inter-cluster replication ( > >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring > >> ). > >>>>>>>> > >>>>>>>> For the future 0.8 release, we are working on intra-cluster > >>> replication > >>>>>>>> support and details can be found at > >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-50 > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> Jun > >>>>>>>> > >>>>>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo < > >>> ccaro...@gmail.com > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> I am looking to implement Kafka in a production environment, > >>> however, > >>>>> I > >>>>>>>>> haven't found in documentation or examples that > >>>>>>>>> discuss how to build a redundant implementation. Is there any > >>>>>>>>> documentation out their (blogs, articles, etc.) that describes > >>>>>>>>> how we can implement such a system with Kafka 0.6 or 0.7. > >>>>>>>>> > >>>>>>>>> Also, is there a timeframe the community is shooting for, to > >> release > >>>>>>> 0.8 w/ > >>>>>>>>> replication? > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> Christian > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>> > >>> > >> > >