Mesos Updates

Benjamin Hindman Wed, 11 May 2011 11:44:18 -0700

Hi All!

I've been rather silently over the past few months focusing on Mesos. In 
particular, I have been working at Twitter to help get Mesos deployed and used. 
I'm thrilled to say that Twitter is invested in seeing the project succeed 
internally and in the open source community!


There has been a bunch of progress over the past few months that I'm happy to 
report. I thought I would send a quick "state of the union" report on Mesos at 
Twitter and discuss what I think is necessary to accomplish for our "first" 
Apache release.

Twitter has three different clusters running Mesos, a "test" cluster, a 
"non-production" (nonprod) cluster, and a "production" (prod) cluster. The test 
cluster is where I incubate new versions of Mesos before they get cascaded 
through nonprod and prod. The nonprod cluster is mostly used for (1) 
experimental new services that are being developed internally and (2) load 
tests. And the prod cluster is being used by numerous "streaming" services that 
perform different tasks based on data that they are ingesting (for example, 
these services get data off of the internal equivalent of the Twitter 
"firehose"). Only a few of the services running in prod and non-prod have 
daemon style "always up" requirements, but the uptimes have been looking great 
as of late! There are some promising objectives right around for the corner for 
Mesos at Twitter, and I'm even more excited to report on those once they 
happen! This includes running Hadoop on Mesos (not the primary reason Twitter 
was excited about Mesos in the first place), as well as some rather "important" 
internal Twitter services ... stay tuned! ;)

There is still lots to be done (which I'll discuss briefly below), but that 
being said I'd love to shoot for our first Apache release date of early June. 
I'm not sure the exact protocol for this ... 

There are a few upcoming features that I wanted to hold out on for the first 
release (all of which are being worked on):
(1) Eliminating SWIG as a dependency for the webui (the biggest blocker I've 
noticed for people downloading and installing/running the system).
(2) Providing task history information.
(3) Handling slave upgrades/failures (without killing the running tasks).
(4) Launching schedulers via the master and persisting task information across 
failures.
(5) Implementing our resource hints mechanism, which has been renamed to 
"requests".

Two more things that I'd like to take care of/understand:

(*) What needs to occur when the time comes to offer some contributors roles as 
committers?

(*) It sounds like Matei has gotten our SVN stuff all setup, so we can bring 
the code in from Github. I'm still a big fan of providing access to the code 
via Github however, I think it's a low barrier of entry to get developers to 
download, read, and play with the code very easily. I'm not sure how other 
projects do it, but I've been told that some projects share a presence on both 
Github and Apache SVN?

If you got all the way through this email, thanks! I'm excited to see Mesos 
take the next steps!

Ben.

P.S. It appears I'm not on [email protected] ... I guess I need to 
add myself?

Mesos Updates

Reply via email to