[ https://issues.apache.org/jira/browse/KAFKA-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241143#comment-14241143 ]
Joe Stein commented on KAFKA-1207: ---------------------------------- Hey [~jayson.minard] we have gone back and forth the last year between "build a scheduler" just for Kafka or "build an executor layer that works in Marathon/Aurora". What we did first was give Aurora a shot since it already has an executor (Thermus) and see about getting Kafka to run there. That script is here https://github.com/stealthly/borealis/blob/master/scripts/kafka.aurora for doing what we did. It relied on an undocumented feature in Aurora that we used which Bill Farner talked about here when I spoke with him on a podcast http://allthingshadoop.com/2014/10/26/resource-scheduling-and-task-launching-with-apache-mesos-and-apache-aurora-at-twitter/ Anyways, there were/are issues with that implementation so we decided then to give Marathon https://mesosphere.github.io/marathon/docs/ a try. We started off with this code as a pattern to use https://github.com/brndnmtthws/kafka-on-marathon and so far it is working out great. It definitely added more work on our side but it is running and doing exactly what we expect. We have been speaking with others about this too and think we could come up with a standalone scheduler that would work out of the box. I don't know if it makes sense though for that to be a JVM process though. We were thinking of writing it in Go. One *VERY* important reason to have another shell launching Kafka is because you want to be able to change scripts and bounce brokers (you kind of have to do this) and if you rolling restart or something your tasks Mesos will schedule them to wherever it wants. Some Kafka improvements are coming that mitigate that some https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Command+Line+and+Related+Improvements but I don't think it would ever be 100% (Kafka is not like Storm or Spark in how it runs). On the Mesos side you can manage this with roles and constraints but at the end of the day you are dealing with a *persistent* server. The way we have gotten around this is using the shell script as an agent that can fetch the updates configs and do restart of the process, etc, etc, etc. There is new feature coming out in Mesos https://issues.apache.org/jira/browse/MESOS-1554 that will make this better however I still like the supervisor shell script strategy ... we could morph the supervisor shell script strategy as a custom scheduler/executor (framework) for Kafka (absolutely) but I am not sure if the project would accept Go code for this feature or not? I would be +1 on it going in and have a few engineers available to work on it over the next 1-2 months. We could also write the whole thing in Java or Scala too though I still don't know if that is going to make it any easier/better to support in the community vs Go. Would love more thoughts and discussions on this here. > Launch Kafka from within Apache Mesos > ------------------------------------- > > Key: KAFKA-1207 > URL: https://issues.apache.org/jira/browse/KAFKA-1207 > Project: Kafka > Issue Type: Bug > Reporter: Joe Stein > Labels: mesos > Fix For: 0.9.0 > > Attachments: KAFKA-1207.patch, KAFKA-1207_2014-01-19_00:04:58.patch, > KAFKA-1207_2014-01-19_00:48:49.patch > > > There are a few components to this. > 1) The Framework: This is going to be responsible for starting up and > managing the fail over of brokers within the mesos cluster. This will have > to get some Kafka focused paramaters for launching new replica brokers, > moving topics and partitions around based on what is happening in the grid > through time. > 2) The Scheduler: This is what is going to ask for resources for Kafka > brokers (new ones, replacement ones, commissioned ones) and other operations > such as stopping tasks (decommissioning brokers). I think this should also > expose a user interface (or at least a rest api) for producers and consumers > so we can have producers and consumers run inside of the mesos cluster if > folks want (just add the jar) > 3) The Executor : This is the task launcher. It launches tasks kills them > off. > 4) Sharing data between Scheduler and Executor: I looked at the a few > implementations of this. I like parts of the Storm implementation but think > using the environment variable > ExectorInfo.CommandInfo.Enviornment.Variables[] is the best shot. We can > have a command line bin/kafka-mesos-scheduler-start.sh that would build the > contrib project if not already built and support conf/server.properties to > start. > The Framework and operating Scheduler would run in on an administrative node. > I am probably going to hook Apache Curator into it so it can do it's own > failure to a another follower. Running more than 2 should be sufficient as > long as it can bring back it's state (e.g. from zk). I think we can add this > in after once everything is working. > Additional detail can be found on the Wiki page > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38570672 -- This message was sent by Atlassian JIRA (v6.3.4#6332)