[ 
https://issues.apache.org/jira/browse/KAFKA-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241143#comment-14241143
 ] 

Joe Stein commented on KAFKA-1207:
----------------------------------

Hey [~jayson.minard] we have gone back and forth the last year between "build a 
scheduler" just for Kafka or "build an executor layer that works in 
Marathon/Aurora". What we did first was give Aurora a shot since it already has 
an executor (Thermus) and see about  getting Kafka to run there. That script is 
here https://github.com/stealthly/borealis/blob/master/scripts/kafka.aurora for 
doing what we did. It relied on an undocumented feature in Aurora that we used 
which Bill Farner talked about here when I spoke with him on a podcast 
http://allthingshadoop.com/2014/10/26/resource-scheduling-and-task-launching-with-apache-mesos-and-apache-aurora-at-twitter/

Anyways, there were/are issues with that implementation so we decided then to 
give Marathon https://mesosphere.github.io/marathon/docs/ a try. We started off 
with this code as a pattern to use 
https://github.com/brndnmtthws/kafka-on-marathon and so far it is working out 
great. It definitely added more work on our side but it is running and doing 
exactly what we expect.

We have been speaking with others about this too and think we could come up 
with a standalone scheduler that would work out of the box. I don't know if it 
makes sense though for that to be a JVM process though. We were thinking of 
writing it in Go. One *VERY* important reason to have another shell launching 
Kafka is because you want to be able to change scripts and bounce brokers (you 
kind of have to do this) and if you rolling restart or something your tasks 
Mesos will schedule them to wherever it wants. Some Kafka improvements are 
coming that mitigate that some 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Command+Line+and+Related+Improvements
 but I don't think it would ever be 100% (Kafka is not like Storm or Spark in 
how it runs). On the Mesos side you can manage this with roles and constraints 
but at the end of the day you are dealing with a *persistent* server. The way 
we have gotten around this is using the shell script as an agent that can fetch 
the updates configs and do restart of the process, etc, etc, etc. There is new 
feature coming out in Mesos https://issues.apache.org/jira/browse/MESOS-1554 
that will make this better however I still like the supervisor shell script 
strategy ... we could morph the supervisor shell script strategy as a custom 
scheduler/executor (framework) for Kafka (absolutely) but I am not sure if the 
project would accept Go code for this feature or not?  I would be +1 on it 
going in and have a few engineers available to work on it over the next 1-2 
months. We could also write the whole thing in Java or Scala too though I still 
don't know if that is going to make it any easier/better to support in the 
community vs Go.

Would love more thoughts and discussions on this here.

> Launch Kafka from within Apache Mesos
> -------------------------------------
>
>                 Key: KAFKA-1207
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1207
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Joe Stein
>              Labels: mesos
>             Fix For: 0.9.0
>
>         Attachments: KAFKA-1207.patch, KAFKA-1207_2014-01-19_00:04:58.patch, 
> KAFKA-1207_2014-01-19_00:48:49.patch
>
>
> There are a few components to this.
> 1) The Framework:  This is going to be responsible for starting up and 
> managing the fail over of brokers within the mesos cluster.  This will have 
> to get some Kafka focused paramaters for launching new replica brokers, 
> moving topics and partitions around based on what is happening in the grid 
> through time.
> 2) The Scheduler: This is what is going to ask for resources for Kafka 
> brokers (new ones, replacement ones, commissioned ones) and other operations 
> such as stopping tasks (decommissioning brokers).  I think this should also 
> expose a user interface (or at least a rest api) for producers and consumers 
> so we can have producers and consumers run inside of the mesos cluster if 
> folks want (just add the jar)
> 3) The Executor : This is the task launcher.  It launches tasks kills them 
> off.
> 4) Sharing data between Scheduler and Executor: I looked at the a few 
> implementations of this.  I like parts of the Storm implementation but think 
> using the environment variable 
> ExectorInfo.CommandInfo.Enviornment.Variables[] is the best shot.  We can 
> have a command line bin/kafka-mesos-scheduler-start.sh that would build the 
> contrib project if not already built and support conf/server.properties to 
> start.
> The Framework and operating Scheduler would run in on an administrative node. 
>  I am probably going to hook Apache Curator into it so it can do it's own 
> failure to a another follower.  Running more than 2 should be sufficient as 
> long as it can bring back it's state (e.g. from zk).  I think we can add this 
> in after once everything is working.
> Additional detail can be found on the Wiki page 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38570672



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to