Looks like we can agree that it's a useful feature to add, which is great! :)
I've written up the first version of a Design Document, please feel free to take a look and comment: https://docs.google.com/document/d/1qCVK40nOKDWlLKYrYUpLDwkH654zCoatIzi4OnsW8ac/edit?usp=sharing Thanks, Benno On Tue, Sep 5, 2017 at 6:24 PM, Jeff Coffler < jeff.coff...@microsoft.com.invalid> wrote: > We're talking with the jemalloc folks. > > They did have a plan to move to cmake, but they wanted to move to cmake > for all platforms simultaneously (a tall order). My thoughts are to ask > them to move to cmake (we'll even do the work here if they'll take 'em) for > Windows only. This is manageable (although they would be using two build > systems - like Mesos DEVs don't know all about that), but jemalloc already > does that since they've checked in some Visual Studio solutions. > > In the past, I've just asked the company to buy what I need (i.e. > Smartheap), and it's worked splendidly. But that's not an option for Mesos, > as you pointed out. > > I'll talk with JohnK this morning (he's managing this), and will see where > he is in discussions with jemalloc authors. Last week, we were talking with > the guy that did a lot of the cmake porting, but we need to talk with the > jemalloc author directly. > > Given the situation, I guess we should just move with jemalloc. If worse > comes to worse, we'll maintain our own cmake build process for jemalloc > against a specific branch. We already maintain local changes for some other > 3rd party dependencies. > > /Jeff > > -----Original Message----- > From: Benno Evers [mailto:bev...@mesosphere.com] > Sent: Tuesday, September 5, 2017 4:02 AM > To: dev@mesos.apache.org > Subject: Re: [Proposal] Use jemalloc as default memory allocator for Mesos > > Hi Jeff, > > do you have a particular alternative in mind? Certainly SmartHeap is a > non-starter because it is proprietary. I actually did look for alternatives > before sending out this proposal, but from what I've seen, there isn't > exactly an abundance of well-tested, widely used and stable malloc > implementations with heap profiling features, i.e. I'm not aware of any > besides tcmalloc and jemalloc. > > I also don't think having a native windows build is as important as being > well-tested, because either the build just works (perfect), or it doesn't > work and the feature will be disabled in windows, then people are in > exactly the same position as they were before, in particular they can still > use windows-native heap profiling solutions if they want. On the other > hand, if we decide on some obscure malloc, there is a much higher chance of > accidentally introducing subtle bugs for the people who enable the feature. > > Best regards, > Benno > > > On Thu, Aug 31, 2017 at 5:41 PM, Jeff Coffler < jeff.coff...@microsoft.com > .invalid> wrote: > > > The fact that Firefox works with jemalloc isn't necessarily > > indicative. I, for one, would like to avoid dependencies on Cygwin for > > Mesos. We don't need it today, and we're building an awful lot. > > > > (Interesting that you brought up SASL-based auth - that's currently in > > the process of being ported - natively - to Windows.) > > > > There are many options for memory allocators that run both on Linux > > and Windows. For example, I've used SmartHeap in the past, and that > > works well on UNIX, Linux, Windows, and more. (That's commercial; I'm > > not sure it's free for open source products or not.) I'm not > > necessarily suggesting SmartHeap, I'm just pointing out that there are > > native options for both Linux and Windows that are well ported and work > everywhere. > > > > Before we decide on using jemalloc, I'd like to see someone look at > > memory allocators and see if there's one we can use (i.e. free) that's > > natively supported both on Linux and Windows without jumping through > > hoops (like Cygwin for builds). If there are native choices, we should > > look at those much more aggressively than an option that doesn't work > well on Windows. > > > > /Jeff > > > > -----Original Message----- > > From: Till Toenshoff [mailto:toensh...@me.com] > > Sent: Wednesday, August 30, 2017 6:27 PM > > To: dev@mesos.apache.org > > Subject: Re: [Proposal] Use jemalloc as default memory allocator for > > Mesos > > > > It appears that jemalloc does support Windows (64bit) > > See: https://na01.safelinks.protection.outlook.com/?url= > > https%3A%2F%2Fgithub.com%2Fjemalloc%2Fjemalloc%2Fblob% > > 2Fdev%2Fmsvc%2FReadMe.txt&data=02%7C01%7CJeff.Coffler%40microsoft.com% > > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011 > > db47%7C1%7C0%7C636397396315954606&sdata=e6RrlOXc% > > 2B9BAY0FwBx3UMKElg3S5SCgZXKVYKGSAfQE%3D&reserved=0 < > > https://na01.safelinks.protection.outlook.com/?url= > > https%3A%2F%2Fgithub.com%2Fjemalloc%2Fjemalloc%2Fblob% > > 2Fdev%2Fmsvc%2FReadMe.txt&data=02%7C01%7CJeff.Coffler%40microsoft.com% > > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011 > > db47%7C1%7C0%7C636397396315954606&sdata=e6RrlOXc% > > 2B9BAY0FwBx3UMKElg3S5SCgZXKVYKGSAfQE%3D&reserved=0> > > > > tcmalloc on the other hand appears to only offer a minimal variant on > > Windows. > > See: https://na01.safelinks.protection.outlook.com/?url= > > https%3A%2F%2Fgithub.com%2Fgperftools%2Fgperftools& > > data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1cb6c56ba96f401a8d6108d4 > > f00f 660e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% > > 7C636397396315954606&sdata=rz1OflD81bHoGkSpw7JinsVKLjpGK9 > > Lso546GnfP5L8%3D&reserved=0 <https://na01.safelinks. > protection.outlook.com/?url=https%3A%2F%2Fna01.safelinks& > data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1841d784c6234e2b646108d4f44d > 989a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% > 7C636402061497535048&sdata=c2U95Fa5kVlc7w3qn8MJ7mB3aJuZtm > kWgvjn8vJzo1E%3D&reserved=0. > > protection.outlook.com/?url=https%3A%2F%2Fgithub.com% > > 2Fgperftools%2Fgperftools&data=02%7C01%7CJeff.Coffler%40microsoft.com% > > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011 > > db47%7C1%7C0%7C636397396315954606&sdata=rz1OflD81bHoGkSpw7JinsVKLjpGK9 > > Lso546GnfP5L8%3D&reserved=0> - grep for "COMPILING ON NON-LINUX SYSTEMS” > > See: https://na01.safelinks.protection.outlook.com/?url= > > https%3A%2F%2Fgithub.com%2Fgperftools%2Fgperftools% > > 2Fblob%2Fmaster%2FINSTALL&data=02%7C01%7CJeff.Coffler%40microsoft.com% > > 7C1cb6c56ba96f401a8d6108d4f00f660e%7C72f988bf86f141af91ab2d7cd011 > > db47%7C1%7C0%7C636397396315954606&sdata=WcU72HYzVMao7yF7LJ7Ks% > > 2Bdv0P7zhjE%2BM6cKIOfa488%3D&reserved=0 <https://na01.safelinks. > protection.outlook.com/?url=https%3A%2F%2Fna01.safelinks& > data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1841d784c6234e2b646108d4f44d > 989a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% > 7C636402061497535048&sdata=c2U95Fa5kVlc7w3qn8MJ7mB3aJuZtm > kWgvjn8vJzo1E%3D&reserved=0. > > protection.outlook.com/?url=https%3A%2F%2Fgithub.com% > > 2Fgperftools%2Fgperftools%2Fblob%2Fmaster%2FINSTALL& > > data=02%7C01%7CJeff.Coffler%40microsoft.com%7C1cb6c56ba96f401a8d6108d4 > > f00f 660e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% > > 7C636397396315954606&sdata=WcU72HYzVMao7yF7LJ7Ks% > > 2Bdv0P7zhjE%2BM6cKIOfa488%3D&reserved=0> - grep for “Windows (MSVC, > > Cygwin, and MinGW)" > > > > So both options rely on Cygwin or MinGW for building - a requirement > > that the Mesos build itself does not have and proves your point of > > stuff not really “just working” at least when it comes to the build > > step of those packages. > > > > Seems that without trying it, we won’t find out if jemalloc works as > > hoped on Windows for us - the Firefox project results however are > > encouraging. On the other hand, if it doesn’t work, we could simply > > decide to disable it on Windows just like some other Mesos features > > will remain disabled on that platform unless someone decides to port > them - e.g. SASL based authn. > > > > > On Aug 25, 2017, at 3:28 PM, Benno Evers <bev...@mesosphere.com> > wrote: > > > > > > Hi Jeff, > > > > > > from looking around on the internet, it seems like Firefox builds > > > with jemalloc on all platforms, and I've also seen reports of people > > > successfully using tcmalloc heap profiling on windows. I'm afraid I > > > don't currently have a Windows machine with development environment > > > set up, so I can't provide direct user experience. > > > > > > In the worst case, we would have to disable jemalloc-support on > > > windows, but I hope it won't be necessary. > > > > > > Since you probably have more experience with memory management on > > > windows, is there a reason to suspect that it should or shouldn't work? > > > > > > Best regards, > > > Benno > > > > > > On Wed, Aug 23, 2017 at 6:16 PM, Jeff Coffler < > > > jeff.coff...@microsoft.com.invalid> wrote: > > > > > >> Hi Benno, > > >> > > >> What's the availability of both jemalloc and tcmalloc on the > > >> Windows platform? Do the products work there properly? > > >> > > >> There are solutions that I know work on Windows (from past work > > >> I've done). I'm unsure about either jemalloc and tcmalloc, however. > > >> > > >> Thanks, > > >> > > >> /Jeff > > >> > > >> -----Original Message----- > > >> From: Benno Evers [mailto:bev...@mesosphere.com] > > >> Sent: Tuesday, August 22, 2017 3:16 AM > > >> To: dev@mesos.apache.org > > >> Subject: Re: [Proposal] Use jemalloc as default memory allocator > > >> for Mesos > > >> > > >> Hi Alexander, > > >> > > >> in general, jemalloc and tcmalloc are very similar, and seem to be > > >> taking ideas from each other (in fact the jeprof executable started > > >> as a copy of pprof and there are still references the pprof > > >> documentation in some > > >> comments) > > >> > > >> From what I've seen, the main difference is that the profiling > > >> seems better-suited to multi-threaded programs, in particular the > > >> profile file format includes per-thread memory statistics and the > > >> profiling features can be turned on and off individually per > > >> thread. From an API perspective, all settings can be accessed by > > >> the mallctl() function, while it seems that tcmalloc requires some > > >> options to be set by environment variable ( > > >> https://na01.safelinks.protection.outlook.com/?url= > > >> https%3A%2F%2Fgperftools.github.io%2Fgperftools% > > >> 2Fheapprofile.html&data=02%7C01%7CJeff.Coffler%40microsoft.com% > > >> 7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf86f141af91ab2d7cd011 > > >> db47%7C1%7C0%7C636389937852256730&sdata=IQeb2% > > >> 2BpcrWRQ8yvdTgOEHfyplgC36dy73nnXswdPamo%3D&reserved=0). Finally, I > > >> also found the documentation to be more thorough. > > >> > > >> But again, the two are very similar, so I think the main decision > > >> here isn't whether to choose jemalloc or tcmalloc but whether to > > >> switch to a custom memory allocator that has support for profiling > > >> heap > > memory usage. > > >> > > >> > > >> On Mon, Aug 21, 2017 at 4:26 PM, Alexander Rojas > > >> <alexan...@mesosphere.io> > > >> wrote: > > >> > > >>> Hi Benno, > > >>> > > >>> This does sound like a great addition to Mesos. Can you however > > >>> explain how jemalloc is better than tcmalloc? I think that for > > >>> such important change, we probably need some more information. > > >>> > > >>> Your comment in MESOS-7876 mentions that we already have tcmalloc > > >>> since it is part of gperftools, so I would like to have a whole > > >>> picture of the advantages and disadvantages of both options. > > >>> > > >>> Alexander Rojas > > >>> alexan...@mesosphere.io > > >>> > > >>> > > >>> > > >>> > > >>>> On 18. Aug 2017, at 12:49, Benno Evers <bev...@mesosphere.com> > wrote: > > >>>> > > >>>> Hi all, > > >>>> > > >>>> I would like to propose bundling jemalloc as a new dependency > > >>>> under `3rdparty/`, and to link Mesos against this new memory > > >>>> allocator by default. > > >>>> > > >>>> > > >>>> # Motivation > > >>>> > > >>>> The Mesos master and agent binaries are, ideally, very > > >>>> long-running processes. This makes them susceptible to memory > > >>>> issues, because even small leaks have a chance to build up over > > >>>> time to the point where they become problematic. > > >>>> > > >>>> We have seen several such issues on our internal Mesos > > >>>> installations, for example > > >>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > >>>> is > > >>>> s > > >>>> ues.apache.org%2Fjira%2Fbrowse%2FMESOS-7748&data=02%7C01%7CJeff.C > > >>>> of > > >>>> f > > >>>> ler%40microsoft.com%7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf > > >>>> 86 > > >>>> f > > >>>> 141af91ab2d7cd011db47%7C1%7C0%7C636389937852266742&sdata=L016YGyE > > >>>> kK > > >>>> 5 > > >>>> 0WtvhgSNS%2FT5ntkkd9qINorRI2Utp5lk%3D&reserved=0 > > >>>> or https://na01.safelinks.protection.outlook.com/?url= > > >> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FMESOS- > > >> 7800&data=02%7C01%7CJeff.Coffler%40microsoft.com% > > >> 7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf86f141af91ab2d7cd011 > > >> db47%7C1%7C0%7C636389937852266742&sdata=IrzDO6o1VL9a8eGJIW3jKbWXk6U > > >> 4f > > >> H > > >> Fn3Xbn4po1r3c%3D&reserved=0. > > >>>> > > >>>> I imagine any organization running Mesos for an extended period > > >>>> of time has had its share of similar issues, so I expect this > > >>>> proposal to be useful for the whole community. > > >>>> > > >>>> > > >>>> # Why jemalloc? > > >>>> > > >>>> Given that memory issues tend to be most visible after a given > > >>>> process has been running for a long time, it would be great to > > >>>> have the option to enable heap tracking and profiling at runtime, > > >>>> without having to restart the process. (This ability could then > > >>>> be connected to a Mesos endpoint, similar to how we can adjust > > >>>> the log level at > > >>>> runtime) > > >>>> > > >>>> The two production-quality memory allocators that have this > > >>>> ability currently seem to be tcmalloc and jemalloc. Of these, > > >>>> jemalloc does produce in our experience better and more detailed > statistics. > > >>>> > > >>>> > > >>>> # What is the impact on users who do not need this feature? > > >>>> > > >>>> Naturally, not every single user of Mesos will have a need for > > >>>> this feature. To ensure these users would not experience serious > > >>>> performance regressions as a result of this change, we conducted > > >>>> a preliminary set of benchmarks whose results are collected under > > >>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > >>>> is > > >>>> s > > >>>> ues.apache.org%2Fjira%2Fbrowse%2FMESOS-7876&data=02%7C01%7CJeff.C > > >>>> of > > >>>> f > > >>>> ler%40microsoft.com%7Ccb0bfb1eb3e242c0dd4108d4e946d709%7C72f988bf > > >>>> 86 > > >>>> f > > >>>> 141af91ab2d7cd011db47%7C1%7C0%7C636389937852266742&sdata=RsZcAGuF > > >>>> m% > > >>>> 2 > > >>>> Bw2PPLgMql%2B9vVgkFQrZZFJYdPGcBODsCU%3D&reserved=0 > > >>>> > > >>>> It turns out that we could probably even expect a small speedup > > >>>> (1% > > >>>> - 5%) as a nice side-effect of this change. > > >>>> > > >>>> Users who compile Mesos themselves would of course have the > > >>>> option to disable jemalloc at configuration time or replace it > > >>>> with their memory allocator of choice. > > >>>> > > >>>> > > >>>> > > >>>> I'm looking forward to hear any thoughts and comments. > > >>>> > > >>>> > > >>>> Thanks, > > >>>> -- > > >>>> Benno Evers > > >>>> Software Engineer, Mesosphere > > >>> > > >>> > > >> > > >> > > >> -- > > >> Benno Evers > > >> Software Engineer, Mesosphere > > >> > > > > > > > > > > > > -- > > > Benno Evers > > > Software Engineer, Mesosphere > > > > > > > -- > Benno Evers > Software Engineer, Mesosphere > -- Benno Evers Software Engineer, Mesosphere