Re: Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-04-03 Thread Benjamin Mahler
Just an update here, in late January we finished upstreaming our internal patches that were upstreamable. This amounted to 35 patches. The cgroups v2 work is ongoing, we're hoping to be mostly code complete by the end of this month. On Fri, Jan 12, 2024 at 6:01 PM Benjamin Mahler wrote

Re: Add s390x support for Mesos CI

2024-04-01 Thread Benjamin Mahler
You can schedule some time with me to try to add this, since you need an apache account to modify these. The ARM one is separate as Tomek mentioned, so likely there would need to be a separate one for s390x: https://ci-builds.apache.org/job/Mesos/job/Mesos-Buildbot-ARM/ It looks like this uses

Re: Add s390x support for Mesos CI

2024-03-11 Thread Benjamin Mahler
presumably you want to update the configuration matrix here? https://ci-builds.apache.org/job/Mesos/job/Mesos-Buildbot/ not sure where this configuration lives though On Mon, Mar 11, 2024 at 1:28 AM Yasir Ashfaq wrote: > Hi All, > > We recently added

Re: Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-01-16 Thread Benjamin Mahler
EEdxzUQsTntmDTSOVDMeG_sbq2zT58dS9IisTDUqIVlh18jPbSXruBk3U$ > > > > > >> | consultancy > > >> < > > > https://urldefense.com/v3/__https://offscale.io__;!!PWjfaQ!uCG12YNoPBcCb2z5v__gdjEEdxzUQsTntmDTSOVDMeG_sbq2zT58dS9IisTDUqIVlh18jPbSQf69

Re: Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-01-12 Thread Benjamin Mahler
+user@ On Fri, Jan 12, 2024 at 5:55 PM Benjamin Mahler wrote: > As part of upgrading to CentOS 9 at X/Twitter, Shatil / Devin (cc'ed) will > be working on: > > * Upgrading to Python 3 > * Cgroups v2 support > > We will attempt to upstream this work for the b

Cgroups v2 + Python 3 + Upstreaming X/Twitter Patches

2024-01-12 Thread Benjamin Mahler
As part of upgrading to CentOS 9 at X/Twitter, Shatil / Devin (cc'ed) will be working on: * Upgrading to Python 3 * Cgroups v2 support We will attempt to upstream this work for the benefit of other users. In addition, we have several long-standing internal patches that should have been

Re: Next steps for Mesos

2023-03-20 Thread Benjamin Mahler
Also if you are still a user of mesos, please chime in. Qian, it might be worth having a more explicit email asking users to chime in as this email was tailored more for contributors. Twitter is still using mesos heavily, we upgraded from a branch based off of 1.2.x to 1.9.x in 2021, but haven't

Re: Apache Mesos Twitter Account

2022-08-21 Thread Benjamin Mahler
Yes I still have access, I was mostly managing it during the 2016-present period but haven't done anything since late 2020. Do you have anything in mind you'd like to see tweeted or retweeted? On Thu, Aug 18, 2022 at 5:05 AM Andreas Peters wrote: > Hi, > > sorry for this non technical question.

Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Benjamin Mahler
+1 (binding) Thanks to all who contributed to the project. On Mon, Apr 5, 2021 at 1:58 PM Vinod Kone wrote: > Hi folks, > > Based on the recent conversations > < > https://lists.apache.org/thread.html/raed89cc5ab78531c48f56aa1989e1e7eb05f89a6941e38e9bc8803ff%40%3Cuser.mesos.apache.org%3E > > >

Re: Problems building HEAD of mesos 1.10.x branch

2020-12-07 Thread Benjamin Mahler
Studio 15 2017" -A "x64" > -DPATCHEXE_PATH="C:\ProgramData\chocolatey\lib\patch\tools\bin" -T host=x64 > # Build mesos agent cmake --build . --target mesos-agent --config Release > > This is the version I started from: > C:\temp\mesos>git log -n 1 > commi

Re: Design document: constraints-based offer filtering

2020-07-29 Thread Benjamin Mahler
Just to add some color to this, picky scheduling has been a long standing issue with the two level scheduling architecture of mesos. Given that mesos does not have enough information from schedulers to be able to pick offers that the scheduler wants, it can take a very long time to receive a

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-06-12 Thread Benjamin Mahler
Ah yes I forgot, the other piece is network membership for the replicated log, through our zookeeper::Group related code. Is that what you're referring to? We could put that behind a module interface as well. On Fri, Jun 12, 2020 at 9:10 PM Benjamin Mahler wrote: > > Apache ZooKeeper i

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-06-12 Thread Benjamin Mahler
; > Samuel Marks > Charity <https://sydneyscientific.org> | consultancy <https://offscale.io> > | open-source <https://github.com/offscale> | LinkedIn > <https://linkedin.com/in/samuelmarks> > > > On Wed, Jun 10, 2020 at 1:42 AM Benjamin Mahler > wrote: &g

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-06-09 Thread Benjamin Mahler
it “session drop” operation that also immediately erases all the >“leased” nodes. We propose to implement this in liboffkv. >- > >Check if the node being created has leased parent. Currently, liboffkv >declares this to be unspecified behavior: it may either throw (i

Re: Subject: [VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-05-27 Thread Benjamin Mahler
+1 (binding) On Mon, May 18, 2020 at 4:36 PM Andrei Sekretenko wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.10.0. > > 1.10.0 includes the following major improvements: > >

Re: [VOTE] Release Apache Mesos 1.7.3 (rc1)

2020-05-07 Thread Benjamin Mahler
+1 (binding) On Mon, May 4, 2020 at 1:48 PM Greg Mann wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.7.3. > > The CHANGELOG for the release is available at: > > https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.3-rc1 > >

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-05-01 Thread Benjamin Mahler
> Sydney Medical School | Westmead Institute for Medical Research | > https://linkedin.com/in/samuelmarks > Director | Sydney Scientific Foundation Ltd <https://sydneyscientific.org> > | Offscale.io of Sydney Scientific Pty Ltd <https://offscale.io> > > PS: Damien - not

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-04-20 Thread Benjamin Mahler
ra-scheduler/aurora> as well as to unrelated > > projects (e.g., removing etcd as a hard-dependency from Kubernetes > > <https://kubernetes.io>… enabling them to choose between ZooKeeper, > etcd, > > and Consul). > > > > Thanks for your continual feedback, &

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-04-17 Thread Benjamin Mahler
the details (and there are a lot of them!), and we'll then get a chance to give feedback. You can look through the mailing list for past examples of design docs (in terms of which sections to include, etc). How does that sound? On Tue, Apr 14, 2020 at 8:44 PM Samuel Marks wrote: > Dear Benjamin Mah

Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-04-14 Thread Benjamin Mahler
Thanks for reaching out, a well maintained and well written wrapper interface to the three backends would certainly make this easier for us vs implementing such an interface ourselves. Is this the client interface?

Welcome Andrei Sekretenko as a new committer and PMC member!

2020-01-21 Thread Benjamin Mahler
Please join me in welcoming Andrei Sekretenko as the newest committer and PMC member! Andrei has been active in the project for almost a year at this point and has been a productive and collaborative member of the community. He has helped out a lot with allocator work, both with code and

Re: RFC: Improving linting in Mesos (MESOS-9630)

2020-01-14 Thread Benjamin Mahler
Benjamin figured this out for me. For posterity, I needed to: $ pyenv install 2.7.17 $ pyenv global 3.7.4 2.7.17 On Tue, Jan 14, 2020 at 2:37 PM Benjamin Mahler wrote: > Have folks been able to set this up successfully on macOS? Is my python > virtual env screwed up? > > ./s

Re: RFC: Improving linting in Mesos (MESOS-9630)

2020-01-14 Thread Benjamin Mahler
Have folks been able to set this up successfully on macOS? Is my python virtual env screwed up? ./support/setup-dev.sh [INFO] Installing environment for local. [INFO] Once installed this environment will be reused. [INFO] This may take a few minutes... An unexpected error has occurred:

Re: [MESOS-10007] random "Failed to get exit status for Command" for short-lived commands

2019-10-21 Thread Benjamin Mahler
Hi Charles, thanks for the thorough ticket and for surfacing it here for attention, it didn't get spotted amongst the JIRA noise. I replied on the ticket with a patch that should fix the issue, we can discuss further in the ticket. Ben On Sat, Oct 19, 2019 at 7:35 AM Charles-François Natali

Re: Which IDE do you recommend to import and develop Mesos and the cpusets module?

2019-09-25 Thread Benjamin Mahler
A number of us use Eclipse, some use vim / emacs or SublimeText. Eclipse's c++ indexer has been working well for me. On Fri, Sep 13, 2019 at 4:18 AM Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > Hi, > > I saw on the Mesos documentation [1] that cquery is recommended. I am not >

Re: [VOTE] Release Apache Mesos 1.9.0 (rc1)

2019-08-27 Thread Benjamin Mahler
> We upgraded the version of the bundled boost very late in the release cycle Did we? We still bundle boost 1.65.0, just like we did during 1.8.x. We just adjusted our special stripped bundle to include additional headers. On Tue, Aug 27, 2019 at 1:39 PM Vinod Kone wrote: > -1 > > We upgraded

[Performance / Resource Management WG] August Update

2019-08-21 Thread Benjamin Mahler
Can't make today's meeting, so sending out some notes: On the performance front: - Long Fei reported a slow master, and perf data indicates a lot of time is spent handling executor churn, this can be easily improved: https://issues.apache.org/jira/browse/MESOS-9948 On the resource management

Re: Mesos 1.9.0 release

2019-08-13 Thread Benjamin Mahler
Thanks for taking this on Qian! I seem to be unable to view the dashboard. Also, when are we aiming to make the cut? On Tue, Aug 13, 2019 at 10:58 PM Qian Zhang wrote: > Folks, > > It is time for Mesos 1.9.0 release and I am the release manager. Here is > the dashboard: >

[Performance / Resource Management WG] July Update

2019-07-17 Thread Benjamin Mahler
On the resource management front, Meng Zhu, Andrei Sekretenko, and myself have been working on quota limits and enhancing multi-role framework support: - A memory leak in the allocator was fixed: MESOS-9852 - Support for quota limits work is well underway, and at this point the major pieces are

Re: Changing behaviour of suppressOffers() to preserve suppressed state on transparent re-registration by the scheduler driver

2019-06-23 Thread Benjamin Mahler
James, yes that's correct. On Sat, Jun 22, 2019 at 12:05 AM James Peach wrote: > So this proposal would only affect schedulers using the libmesos scheduler > driver API? Schedulers using the v1 HTTP would not get any changes in > behaviour, right? > > > On Jun 21, 2019, at 9:56 PM, Andrei

[Performance / Resource management WG] Notes in lieu of tomorrow's meeting

2019-05-13 Thread Benjamin Mahler
I'm out of the country and so I'm sending out notes in lieu of tomorrow's performance / resource management meeting. Resource Management: - Work is underway for adding the UPDATE_FRAMEWORK scheduler::Call. - Some fixes and small performance improvements landed for the random sorter. - Perf data

Performance / Resource Management Update

2019-04-17 Thread Benjamin Mahler
In lieu of today's meeting, this is an email update: The 1.8 release process is underway, and it includes a few performance related changes: - Parallel reads for the v0 API have been extended to all other v0 read only endpoints (e.g. /state-summary, /roles, etc). Whereas in 1.7.0, only /state

Re: Subject: [VOTE] Release Apache Mesos 1.8.0 (rc1)

2019-04-15 Thread Benjamin Mahler
The CHANGELOG highlights seem a bit lacking? - For some reason, the task CLI command is listed in a performance section? - The parallel endpoint serving changes are in the longer list of items, seems like we highlight them in the performance section? Maybe we could be specific too about what we

Re: [External] Re: docker containerizer with nvidia-docker

2019-04-05 Thread Benjamin Mahler
docker" agent option to replace the docker command > with the nvidia-docker command, GPU support with the docker containerizer > seems trivial. Did I miss anything? > > On Thu, Apr 4, 2019 at 8:00 PM Benjamin Mahler wrote: > > > The "UCR" (aka mesos contain

Re: docker containerizer with nvidia-docker

2019-04-04 Thread Benjamin Mahler
The "UCR" (aka mesos containerizer) and "Docker containerizer" are two different containerizers that users tend to choose between. UCR is what many of our serious users rely on and so we made the investment there first. GPU support for the docker containerizer was also something that was planned,

Re: Bundled glog update from 0.3.3 to 0.4.0

2019-03-27 Thread Benjamin Mahler
Thanks Andrei! Some interesting changes for us from what I see: - Looks like there are some potential memory allocation reduction changes which is nice. ("reduce dynamic allocation from 3 to 1 per log message" in 0.3.4) - https://github.com/google/glog/pull/245 (this will change the log file

Re: [MESOS-8248] - Expose information about GPU assigned to a task

2019-03-22 Thread Benjamin Mahler
Containers can be assigned multiple GPUs, so I assume you're thinking of putting these metrics in a repeated message? (similar to DiskStatistics) It has seemed to me we should probably make this Nvidia specific (e.g. NvidiaGPUStatistics). In the past we thought generalizing this would be good,

Re: Design Doc: Metrics subset access

2019-03-15 Thread Benjamin Mahler
Thanks Benno, this has come up before, mainly in the context of reducing cost of computing / serving / processing large numbers of metrics. However, in that use case, a single prefix wasn't sufficient because the user would be interested in the subset of the metrics that they're using for graphs

[Performance / Resource Management WG] Meeting Cancelled Today

2019-02-20 Thread Benjamin Mahler
I was not able to wrangle together performance content for today's meeting. On the resource management side, the design is nearly finalized for supporting quota limits distinct from quota guarantees, in a flat role model (no hierarchy). As an early FYI, as a result of the complexity of

Re: Welcome Benno Evers as committer and PMC member!

2019-01-30 Thread Benjamin Mahler
Welcome Benno! Thanks for all the great contributions On Wed, Jan 30, 2019 at 6:21 PM Alex R wrote: > Folks, > > Please welcome Benno Evers as an Apache committer and PMC member of the > Apache Mesos! > > Benno has been active in the project for more than a year now and has made > significant

[Performance / Resource Management WG] Meeting Notes - January 16

2019-01-17 Thread Benjamin Mahler
Thanks to the folks who joined: Ilya Pronin, James Peach, Meng Zhu, Chun-Hung Hsiao, Colin Dunn, Pawel Palucki, Maciej Iwanowski (hopefully I didn't forget anyone) This month Maciej and Pawel from Intel joined to present their work on interference detection / remediation. You can see the slides

Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-02 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.14.2 $ clang++ --version Apple LLVM version 10.0.0 (clang-1000.10.44.4) Target: x86_64-apple-darwin18.2.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin $ ./configure CC=clang CXX=clang++

Reminder: Performance / Resource Management WGs meeting today at 10am PST.

2018-12-19 Thread Benjamin Mahler
This will be a combined meeting of the two working groups, and will be the first meeting for the Resource Management working group. See you there!

"Resource Management" Working Group

2018-12-18 Thread Benjamin Mahler
Over the past few months, we've been making several improvements related to quota enforcement, using multiple roles / schedulers, as well as scalability of the allocator in the master. Going forward we'll be shifting back to push additional features, like quota limits, hierarchical quota,

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-10 Thread Benjamin Mahler
I think we're agreed: -There are no schedulers modeling the existing per-agent time-based filters that mesos is tracking, and we shouldn't go in a direction that encourages frameworks to try to model and manage these. So, we should be very careful in considering something like CLEAR_FILTERS.

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-05 Thread Benjamin Mahler
Thanks for bringing REQUEST_RESOURCES up for discussion, it's one of the mechanisms that we've been considering for further scaling pessimistic offers before we make the migration to optimistic offers. It's also been referred to as "demand" rather than "request", but for the sake of this

Re: [API WG] Proposals for dealing with master subscriber leaks.

2018-11-11 Thread Benjamin Mahler
>- We can add heartbeats to the SUBSCRIBE call. > This would need to be > part of a separate operator Call, because one platform (browsers) that > might subscribe to the master does not support two-way streaming. This doesn't make sense to me, the heartbeats should still be part of the same

Parallel test runner now the default for autotools / make check

2018-11-08 Thread Benjamin Mahler
During the MesosCon hackathon Benjamin Bannier and myself worked on getting the parallel test runner usable as the default for the autotools build. Now, when running 'make check', the tests will run much much faster! What we did: -added detection of 'ulimit -u' and exit if too low -fixed some

Welcome Meng Zhu as PMC member and committer!

2018-10-31 Thread Benjamin Mahler
Please join me in welcoming Meng Zhu as a PMC member and committer! Meng has been active in the project for almost a year and has been very productive and collaborative. He is now one of the few people of understands the allocator code well, as well as the roadmap for this area of the project. He

Re: Dedup mesos agent status updates at framework

2018-10-29 Thread Benjamin Mahler
l) scheduler will remove the status > update from the queue, and in case of failure, Mesos Master will send > status update again. > > > > On Sun, Oct 28, 2018 at 10:15 PM Benjamin Mahler > wrote: > > > Which version of mesos are you running? > > > > >

Re: Dedup mesos agent status updates at framework

2018-10-28 Thread Benjamin Mahler
ff period from 10s -> 30s or > 60s, and simultaneously explore if dedup is an option. > > Thanks, > Varun > > On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler > wrote: > > > Hi Varun, > > > > What problem are you trying to solve precisely? There seems to be

Re: Dedup mesos agent status updates at framework

2018-10-28 Thread Benjamin Mahler
Hi Varun, What problem are you trying to solve precisely? There seems to be an implication that the duplicate acknowledgements are expensive. They should be low cost, so that's rather surprising. Do you have any data related to this? You can also tune the backoff rate on the agents, if the

Re: LibProcess on windows

2018-10-19 Thread Benjamin Mahler
+andy Some folks have been working on porting it to windows, they could provide you with the latest status. On Thu, Oct 18, 2018 at 1:48 PM Vaibhav Khanduja wrote: > Has anybody used libprocess outside of mesos on windows? Thx >

Re: Proposal: Adding health check definitions to master state output

2018-10-18 Thread Benjamin Mahler
> It's worth mentioning that I believe the original intention of the 'Task' > message was to contain most information contained in 'TaskInfo', except for > those fields which could grow very large, like the 'data' field. +1 all task / executor metadata should be exposed IMO. I look at the 'data'

[Performance WG] Meeting today

2018-10-17 Thread Benjamin Mahler
Hi folks, I didn't get a chance to send out an agenda for this meeting, and it looks like only Chun-Hung joined, so let's just do this over email instead. Since the last meeting, we landed the copy-on-write Resources optimization and we landed the fixes to sorter performance. The blog post

Re: monitoring mesos master load

2018-10-12 Thread Benjamin Mahler
The following are probably what you're looking for: https://issues.apache.org/jira/browse/MESOS-9237 https://issues.apache.org/jira/browse/MESOS-9236 On Fri, Oct 12, 2018 at 12:02 PM Eric Chung wrote: > Hello devs, > > We recently had an incident where the master was overloaded by the >

Re: Mesos Flakiness Statistics

2018-10-12 Thread Benjamin Mahler
Thanks for sending this Benno! I for one would love to see more regular communication about the state of CI, especially so that I know how I can help fix tests (right now I don't know which flaky tests are in areas I am maintaining). Is there any reason the first portion of the test name is being

Re: Adding support for implicit allocation of mandatory custom resources in Mesos

2018-10-11 Thread Benjamin Mahler
Thanks for the thorough explanation. Yes, it sounds acceptable and useful for assigning disk i/o and network i/o. The error case of there not being enough resources post-injection seems unfortunate but I don't see a way around it. Can you file a ticket with this background? On Thu, Oct 11, 2018

Re: Vote now for MesosCon 2018 proposals!

2018-09-25 Thread Benjamin Mahler
Voted! Thanks Jörg and the PC! On Thu, Sep 20, 2018 at 9:51 AM Jörg Schad wrote: > Dear Mesos Community, > > Please take a few minutes over the next few days and review what members > of the community have submitted for MesosCon 2018 > (which will be held in San

Re: Differing DRF flavors over roles and frameworks

2018-09-24 Thread Benjamin Mahler
Filed https://issues.apache.org/jira/browse/MESOS-9255 to make this consistent. On Thu, Nov 30, 2017 at 12:27 PM, Benjamin Mahler wrote: > > > On Thu, Nov 30, 2017 at 2:52 PM, Benjamin Bannier < > benjamin.bann...@mesosphere.io> wrote: > >> Hi Ben, >&g

[Performance WG] Meeting Notes - September 19

2018-09-20 Thread Benjamin Mahler
Thanks to those who joined: Yan Xu, Chun-Hung Hsiao, Meng Zhu, Carl Dellar Notes: (1) I forgot to mention during the meeting that more progress has happened on the parallel reads of master state for the other read-only endpoints. Alex or Benno can reply to this thread to provide an update. [1]

[Performance WG] Meeting Reminder: September 19

2018-09-19 Thread Benjamin Mahler
Just a reminder that the meeting today will go ahead as planned. There are no guest presentations this time around, but as usual we'll go over the exciting work that's been happening lately. Feel free to add items to the agenda. I will send out notes afterwards as usual. Ben

[Performance WG] Meeting Notes - August 15

2018-08-15 Thread Benjamin Mahler
For folks that missed it, here are my notes. Thanks to jie for presenting! (1) Jie presented a containerization benchmark: https://reviews.apache.org/r/68266/ The motivation to add this was the mount table read issue that came up originally in MESOS-8418 [1]. We only pushed a short term fix for

[Performance WG] Meeting Reminder: August 15

2018-08-14 Thread Benjamin Mahler
Just a reminder that the meeting tomorrow will go ahead as planned. Jie graciously agreed to discuss a container launch benchmark that he's been working on; should be interesting! There are a few more topics up for discussion, and feel free to add to the agenda. I will send out notes afterwards

Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-13 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.13.6 with Apple LLVM version 9.1.0 (clang-902.0.39.2). Thanks Kapil! On Wed, Aug 8, 2018 at 3:06 PM, Kapil Arya wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.4.2. > > 1.4.2 is a bug fix release. The CHANGELOG

Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-13 Thread Benjamin Mahler
This was fixed in https://github.com/apache/mesos/commit/02ad5c8cdd644ee8eec83bf887daa98bb163637d, I don't recall there being any issues due to it. On Mon, Aug 13, 2018 at 4:50 PM, Benjamin Mahler wrote: > Hm.. I ran make check on macOS saw the following: > &

Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-13 Thread Benjamin Mahler
Hm.. I ran make check on macOS saw the following: [ RUN ] AwaitTest.AwaitSingleDiscard src/tests/collect_tests.cpp:275: Failure Value of: promise.future().hasDiscard() Actual: false Expected: true [ FAILED ] AwaitTest.AwaitSingleDiscard (0 ms) On Wed, Aug 8, 2018 at 3:06 PM, Kapil Arya

Re: Using jemalloc as default allocator

2018-08-13 Thread Benjamin Mahler
I would be interested in knowing what other projects have done around this (e.g. Rust, Redis seem to use it by default on Linux, and I see ongoing discussion in other projects e.g. Ruby). To James' point, while anyone technically can use jemalloc, I have only seen 1 user doing it. Maybe there are

Re: Backport Policy

2018-07-26 Thread Benjamin Mahler
> consistent >>> > >>> (and safe) within a release. With that as the goal of a branch in >>> > >>> maintenance mode, it makes sense to fix regressions, and make >>> > exceptions to >>> > >>> fix CVEs and other critical/bl

[Performance WG] Meeting Notes - July 18

2018-07-18 Thread Benjamin Mahler
For folks that missed it, here are my own notes. Thanks to alexr and dario for presenting! (1) I discussed a high agent cpu usage issue when hitting the /containers endpoint: https://issues.apache.org/jira/browse/MESOS-8418 This was resolved, but it didn't get attention for months until I

Reminder: Performance Working Group Meeting July 18 10AM PST

2018-07-17 Thread Benjamin Mahler
Hi folks, just a reminder that there is indeed a performance working group meeting tomorrow. We'll discuss what's been going on recently in the performance area, and there's a lot to discuss! I will send out some detailed notes to the mailing lists afterwards. Ben

Re: Backport Policy

2018-07-12 Thread Benjamin Mahler
he clarification. I'm in agreement with the points you > > made. > > > > Once we have consensus, would you mind updating the doc? > > > > On Wed, Jul 11, 2018 at 5:15 PM Benjamin Mahler > > wrote: > > > > > I realized recently that we aren't all on

Backport Policy

2018-07-11 Thread Benjamin Mahler
I realized recently that we aren't all on the same page with backporting. We currently only document the following: "Typically the fix for an issue that is affecting supported releases lands on the master branch and is then backported to the release branch(es). In rare cases, the fix might

Re: Normalization of metric keys

2018-07-06 Thread Benjamin Mahler
Do we also want: 3. Has an unambiguous decoding. Replacing '/' with '#%$' means I don't know if the user actually supplied '#%$' or '/'. But using something like percent-encoding would have property 3. On Fri, Jul 6, 2018 at 10:25 AM, Greg Mann wrote: > Thanks for the reply Ben! > > Yea I

Re: [Proposal] Replicated log storage compaction

2018-07-06 Thread Benjamin Mahler
I was chatting with Ilya on slack and I'll re-post here: * Like Jie, I was hoping for a toggle (maybe it should start default off until we have production experience? sounds like Ilya has already experience with it running in test clusters so far) * I was asking whether this would be considered

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

2018-07-03 Thread Benjamin Mahler
I just pushed some initial documentation for this, it will show up soon next to the memory profiling link: http://mesos.apache.org/documentation/latest/#administration On Fri, May 25, 2018 at 6:13 PM, Benjamin Mahler wrote: > I'll write up some instructions with what I know so far and

Re: Normalization of metric keys

2018-07-03 Thread Benjamin Mahler
I don't think the lack of principal normalization was intentional. Why spread that further? Don't we also have some normalization today? Having slashes show up in components complicates parsing (can no longer split on '/'), no? For example, if we were to introduce the ability to query a subset of

Re: CHECK_NOTNULL(self->bev) Check failed inside LibeventSSLSocketImpl::shutdown

2018-06-27 Thread Benjamin Mahler
Can you also include the stack trace from the CHECK failure? On Tue, Jun 26, 2018 at 11:25 PM, Suteng wrote: > F0622 11:22:30.985245 16127 libevent_ssl_socket.cpp:190] Check failed: > 'self->bev' Must be non NULL > > Try LibeventSSLSocketImpl::shutdown(int how) > > CHECK_NOTNULL(self->bev)

[Performance WG] June meeting canceled

2018-05-31 Thread Benjamin Mahler
Hi folks, I will be out for most of June on vacation so I'm canceling this month's performance working group meeting. I was planning to bring up the kubernetes' kubelet benchmark: https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/ Right now we don't have any

Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-29 Thread Benjamin Mahler
d, May 23, 2018 at 11:39 AM, Michael Park wrote: > >> Huh... 樂 Super weird. I'll look into it. >> >> Thanks for checking! >> >> MPark >> >> On Wed, May 23, 2018 at 11:34 AM Vinod Kone wrote: >> >>> It's empty for me too! >&g

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

2018-05-25 Thread Benjamin Mahler
t; > On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler <bmah...@apache.org> > wrote: > > > +Judith > > > > There should be a recording. Judith, do you know where they get posted? > > > > Benjamin, glad to hear it's useful, I'll continue doing it! >

Re: [VOTE] Release Apache Mesos 1.5.1 (rc1)

2018-05-23 Thread Benjamin Mahler
+1 (binding) make check passes on macOS 10.13.4 with Apple LLVM version 9.1.0 (clang-902.0.39.1) On Fri, May 11, 2018 at 12:35 PM, Gilbert Song wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.5.1. > > 1.5.1 includes the following: >

Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-23 Thread Benjamin Mahler
Thanks Michael! Looks like the tar.gz is empty, is it just me? On Tue, May 22, 2018 at 10:09 PM, Michael Park wrote: > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.3.3. > > The CHANGELOG for the release is available at: >

High Level Design Doc: Offer Starvation

2018-05-18 Thread Benjamin Mahler
Hi folks, One of the long standing issues with running many frameworks on Mesos is the presence of what is called "offer starvation". This is when some role/framework that has unsatisfied demand is not receiving offers, while mesos is continually sends offers to other roles/frameworks that don't

Re: [mesos-mail] Re: [Performance WG] Notes from meeting today

2018-05-16 Thread Benjamin Mahler
great, especially when put into context with thought like here. > > > > > > Benjamin > > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler <bmah...@apache.org> > wrote: > > > > > > Hi folks, > > > > > > Here are some not

[Performance WG] Notes from meeting today

2018-05-16 Thread Benjamin Mahler
Hi folks, Here are some notes from the performance meeting today. (1) First I did a demo of flamescope, you can find it here: https://github.com/Netflix/flamescope It's a very useful tool, hopefully we can make it easier for users to generate the data that we can drop into flamescope when

Re: Add hostname or agentid in rescind offers callback

2018-05-07 Thread Benjamin Mahler
t;> Benefits of adding host name and agent id in rescind offer callback. > >> - Mutex locks to synchronize both maps, leads to some performance hit. > >> - Managing second map, is more code and prone to bugs. > >> - Little overhead on heap memory and GC. > >>

Re: 答复: libprocess libevent backend

2018-05-06 Thread Benjamin Mahler
ess/src/libevent.cpp > > > > 206 // TODO(jmlvanre): Allow support for 'epoll' once SSL related > > 207 // issues are resolved. > > 208 struct event_config* config = event_config_new(); > > 209 event_config_avoid_method(config, "epoll"); > > >

Re: libprocess libevent backend

2018-05-03 Thread Benjamin Mahler
ode/mesos/3rdparty/libprocess/src/libevent.cpp > > 206 // TODO(jmlvanre): Allow support for 'epoll' once SSL related > 207 // issues are resolved. > 208 struct event_config* config = event_config_new(); > 209 event_config_avoid_method(config, "epoll"); > > >

Re: libprocess libevent backend

2018-05-03 Thread Benjamin Mahler
Which issue are you referring to? Libprocess uses libev by default, with --enable-libevent as a configure option to use libevent instead. Both of these backends should use epoll if the system has it available. Are you seeing otherwise? On Thu, May 3, 2018 at 6:15 AM, Suteng

Re: Add hostname or agentid in rescind offers callback

2018-05-02 Thread Benjamin Mahler
I'm a -1 on adding redundant information in the message. The scheduler can maintain an index of offers by offer id to address this issue: hostname -> offers offer_id -> offer On Wed, May 2, 2018 at 11:39 AM, Vinod Kone wrote: > Can I ask why you are indexing the offers

Re: Question on status update retry in agent

2018-04-18 Thread Benjamin Mahler
ger.hpp#L173> > queue. Now when the ack comes, they can be in any order for the status > update but _handle section pops > <https://github.com/apache/mesos/blob/master/src/slave/task_ > status_update_manager.cpp#L888> > last update from the queue without making sure, ack was for th

Re: Performance Working Group Agenda for Tomorrow

2018-04-18 Thread Benjamin Mahler
I'll cancel this one and we can aim to meet again next month. On Tue, Apr 17, 2018 at 12:06 PM Benjamin Mahler <bmah...@apache.org> wrote: > Do folks have any agenda items they would like to discuss for tomorrow's > performance working group meeting? > > There haven't been a

Performance Working Group Agenda for Tomorrow

2018-04-17 Thread Benjamin Mahler
Do folks have any agenda items they would like to discuss for tomorrow's performance working group meeting? There haven't been a lot of performance related activity in the past month, so will cancel this one unless folks chime in here. Ben

Re: Proposal: Asynchronous IO on Windows

2018-04-12 Thread Benjamin Mahler
Thanks for writing this up and exploring the different options Akash! I left some comments in the doc. It seems to me the windows thread pool API is a mix of "event" processing (timers, i/o), as well a work queue. Since libprocess already provides a work queue via `Process`es, there's some

Re: Question on status update retry in agent

2018-04-10 Thread Benjamin Mahler
Do you have logs? Which acknowledgements did the agent receive? Which TASK_RUNNING in the sequence was it re-sending? On Tue, Apr 10, 2018 at 6:41 PM, Benjamin Mahler <bmah...@apache.org> wrote: > > Issue is that, *old executor reference is hold by slave* (assuming it >

Re: Question on status update retry in agent

2018-04-10 Thread Benjamin Mahler
ager.cpp#L318> > and acknowledge > <https://github.com/apache/mesos/blob/master/src/slave/ > task_status_update_manager.cpp#L760> > . > > > Thanks, > Varun > > > > > > > > > > > > > > > > > > > On Fri, Mar

Re: Proposal: Constrained upgrades from Mesos 1.6

2018-04-10 Thread Benjamin Mahler
-user Do you have a link to the technical details of why this needs to be done? For instance, why can't master/agent versions be used to determine which behavior is performed between the master and agent? On Tue, Apr 10, 2018 at 5:34 PM, Greg Mann wrote: > Hi all, > We are

CHECK_NOTNONE / CHECK_NOTERROR

2018-04-10 Thread Benjamin Mahler
Just an FYI about some recently added CHECKs that make some minor changes to the way we write code: (1) CHECK_NOTNONE: Much like glog's CHECK_NOTNULL, sometimes you know from invariants that an Option cannot be in the none state and you want to "de-reference" it without writing logic to handle

Re: Tasks not getting killed

2018-04-10 Thread Benjamin Mahler
It's the executor's responsibility to forcefully kill a task after the task kill grace period. However, in your case it sounds like the executor is getting stuck? What is happening in the executor? If the executor is alive but doesn't implement the grace period force kill logic, the solution is to

  1   2   3   4   5   6   7   8   9   10   >