Joe: Thanks for you answer, but we're trying to push Kafka Broker at each site... ... so your answer makes me realize why we're trying to push Kafka over per-producers services call: that would make a very large number of services call from each site (our logs producers gather data every 5 minutes, on average 100 items of about 128 bytes per machines, and we're targeting from 250 to 4000 machines per "site").
I think that, with these numbers, we have a way make IT people understand that Kafka solution will avoid flooding the site's firewall infrastructure (which is active for outbound connections). Beyond this good point for Kafka in terms of # of concurrent connections, I am wondering if we could find other assets for Kafka solution... Jean -----Original Message----- From: Joe Stein [mailto:crypt...@gmail.com] Sent: Sunday, October 21, 2012 1:26 AM To: kafka-users@incubator.apache.org Subject: Re: Kafka versus classic central HTTP(s) services for logs transmission You could move the producer code to the "site" and expose that as a REST interface. You can then benefit from the scale and consumer functionality that comes with Kafka without these issues you are bringing up. On Oct 20, 2012, at 4:27 PM, Jean Bic <jean.b...@gmail.com> wrote: > Hello, > > We have started to build a solution to gather logs from many machines > located in various “sites” to a so-called “Consolidation server” which role > is to persists the logs and generate alerts based on some criteria > (patterns in logs, triggers on some values, etc). > > > We are challenged by our future users to clarify why Kafka is for this need > the best possible communication solution. They argue that it would be > better to choose a more classic HTTP(S) based solution with producers > calling REST services on a pool of Node.js servers behind a load-balancer. > > > One of the main issue they see with Kafka is that It requires connections > from Consolidation Server to Kafka brokers and to Zookeeper daemons located > in each “site”, versus connections from logs producers in all sites to the > Consolidation servers. > Here Kafka is seen as a burden for each site’s IT team requiring some > firewall special setup, versus. no firewall setup with the service-based > solution : > > 1. Kafka requires for each site IT team to create firewall rules for > accepting incoming connections for a “non standard” protocol from the > “Collector server” site > > 2. IT team must expose all Zookeeper and Broker machines/ports to the > “Collector server” site > > 3. Kafka has no built-in encryption for data, where as a classic services > oriented solution can rely on HTTPS (reverse) proxies > > 4. Kafka is not commonly known by IT people who do not know how to > scale it: when should they add broker machines versus when should they add > zookeeper machines? > > With the services-based solution, the IT teams of each site are free of > scalability issues, only on “Consolidation server” site one has to add > Node.js machine to scale up. > > I agree that these IT concerns can't be taken lightly. > > I need help from Kafka community to find rock solid assets for using Kafka > over classic services-based solution. > > How would you “defend” Kafka against above “attacks”? > > > Regards, > > Jean