Design doc: Agent draining and deprecation of maintenance primitives

2019-05-29 Thread Joseph Wu
Hi all, A few years back, we added some constructs called maintenance primitives to Mesos. This feature was meant to allow operators and frameworks to cooperate in draining tasks off nodes scheduled for maintenance. As far as we've observed since, this feature never achieved enough adoption to

Re: [VOTE] Release Apache Mesos 1.8.0 (rc2)

2019-04-23 Thread Joseph Wu
-1 (binding) We found a serious bug when upgrading from 1.7.x to 1.8.x, which prevents agents from reregistering after upgrading the masters: https://issues.apache.org/jira/browse/MESOS-9740 On Tue, Apr 23, 2019 at 8:27 AM Andrei Budnik wrote: > +1 > > sudo make -j16 distcheck >

Re: full Zookeeper authentication

2019-01-08 Thread Joseph Wu
ite small. Should I do make a github pull > request or it is already "advance contribution"? > > -- > > Dmitrii Kishchukov. > Leading software developer > Submission Portal Team > > > On 1/7/19, 5:10 PM, "Joseph Wu" wrote: > > I w

Re: full Zookeeper authentication

2019-01-07 Thread Joseph Wu
I would be happy to shepherd (now that I'm back from winter holidays). On Mon, Dec 24, 2018 at 6:15 AM Alex Rukletsov wrote: > Made you a contributor and assigned the issue to you. Thanks! > > Joseph, will you shepherd this? > > On Fri, Dec 21, 2018 at 4:32 PM Kishchukov, Dmitrii (NIH/NLM/NCBI)

Re: full Zookeeper authentication

2018-12-10 Thread Joseph Wu
There are two options for contributing: 1) You can make a pull request against the GitHub mirror: https://github.com/apache/mesos . We generally only use PRs for minor changes, like typos, documentation, or uploading binaries. See

Re: full Zookeeper authentication

2018-12-07 Thread Joseph Wu
There are currently three components of Mesos that use Zookeeper: *Master Detector:* This object is used by the Mesos Master, Agent, and Scheduler to find which Master is the leader. The existing detector code will parse a "zk://" URL if given here:

[API WG] Proposals for dealing with master subscriber leaks.

2018-11-09 Thread Joseph Wu
Hi all, During some internal scale testing, we noticed that, when Mesos streaming endpoints are accessed via certain proxies (or load balancers), the proxies might not close connections after they are complete. For the Mesos master, which only has the /api/v1 SUBSCRIBE streaming endpoint, this

Re: 122 resources.cpp:1134] Check failed: !resource.has_role() cpus:8

2018-09-26 Thread Joseph Wu
I believe what you are running into is a slight change in how we represent Resources. Older frameworks expect unreserved resources to look like this: > { > role: "*", > reservation: , > reservations: > } In 1.4.0, we started representing unreserved resources like: > { > role: , >

Re: Mesos replicated log dual writes

2018-06-05 Thread Joseph Wu
The two-byte difference is most likely coming from the "learned" field: https://github.com/apache/mesos/blob/master/src/messages/log.proto#L49 The first time an entry is recorded, the entry is "unlearned", basically meaning that the entry has not been written to a quorum of masters (yet). Once a

Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Joseph Wu
Hi devs & users, I'm happy to announce that Andrew Schwartzmeyer has become a new committer and member of the PMC for the Apache Mesos project. Please join me in congratulating him! Andrew has been an active contributor to Mesos for about a year. He has been the primary contributor behind our

Re: DC/OS (Mesos) portability

2017-11-03 Thread Joseph Wu
It isn't clear to me how DC/OS would benefit from (ongoing) work to create/push Mesos packages. DC/OS downloads and builds all of its component parts from source. Also, we (Mesos devs) are hoping to get more frameworks to move away from using libmesos (including the API shims), in favor of using

[Design Doc] Standalone Container API

2017-08-07 Thread Joseph Wu
As part of work to improve storage support in Mesos [1], we will be adding the ability to launch containers via the Mesos Containerizer, without going through the traditional method (i.e. framework -> offer cycle -> launch executor/task -> status updates -> etc). Below I've linked a short design

Re: Custom isolators - External container

2017-08-07 Thread Joseph Wu
First off, the external containerizer was officially removed in Mesos 1.1.0 (it had been deprecated long before that release): https://issues.apache.org/jira/browse/MESOS-3370 --- If you want to develop/deploy a new isolation method for Mesos, you should first consider writing isolator modules

Re: The state of cmake

2017-06-21 Thread Joseph Wu
Here's the earlier email which has the feature comparison: https://lists.apache.org/thread.html/527a29b45c52a042c122c96754804983b1447b7409ffec3d635b7143@%3Cdev.mesos.apache.org%3E The list is still accurate, except that precompiled headers are no longer "upcoming". On Wed, Jun 21, 2017 at 4:42

Re: Mesos Executor Failing

2017-05-24 Thread Joseph Wu
in that why its happening? > > Regards > Sumit Chawla > > > On Fri, May 19, 2017 at 2:31 PM, Joseph Wu <jos...@mesosphere.io> wrote: > >> What version of Mesos are you using? (Just based on the word "slave" in >> that error message, I'm guessing 0.

Re: Mesos Executor Failing

2017-05-19 Thread Joseph Wu
What version of Mesos are you using? (Just based on the word "slave" in that error message, I'm guessing 0.28 or older.) The "Failed to synchronize" error is something that can occur while the agent is launching the executor. During the launch, the agent will create a pipe to the executor

CMake and (eventually) deprecating the autotools build

2017-03-14 Thread Joseph Wu
Hi Devs! The CMake build system for Mesos is now complete enough for wider consumption. The plan is to review all the differences between the CMake and Autotools build systems and eventually deprecate the Autotools build system. A few of us are already using CMake exclusively for development.

Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-07 Thread Joseph Wu
+1 (binding) Deployed on a small-ish test cluster for about a week. Monitoring of that test cluster has not caught any problems with Mesos. Also confirmed that this SSL socket FD leak does not affect Mesos, except in tests: https://issues.apache.org/jira/browse/MESOS-6919 On Mon, Mar 6, 2017

Note about ".proto" files from Mesos 1.3.0+

2017-02-16 Thread Joseph Wu
Hi devs/contributors, The next time you checkout HEAD and open a .proto file, you may notice this line at the top of the file (after the Apache license, of course): syntax = "proto2"; This has been added to all our protobufs in order to allow different versions of the protobuf compiler to

Re: Removing `support/apply-review.sh`

2017-01-13 Thread Joseph Wu
+1 for one less character to type while tab-completing: support/ap s On Fri, Jan 13, 2017 at 10:15 AM, Vinod Kone wrote: > +1 to remove > > On Fri, Jan 13, 2017 at 1:39 AM, haosdent wrote: > > > +1 for remove this. > > > > On Fri, Jan 13, 2017 at 5:37

Re: [VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-11-29 Thread Joseph Wu
AlexR, Thanks for pointing out those test failures. As of 0.28, the LinuxFilesystemIsolatorTests were notoriously flaky on distributions with "large" root filesystems. The test would essentially copy the root filesystem, leading to timeouts in multiple places in the tests. CentOS 7 was known

Re: Attendance for Mesos Developer Community Meeting (Nov 17)

2016-11-16 Thread Joseph Wu
+0.9 Is there an agenda in case there are enough attendees? On Wed, Nov 16, 2016 at 3:15 PM, James Peach wrote: > > > On Nov 16, 2016, at 3:06 PM, Michael Park wrote: > > > > If you're planning to attend this meeting, please reply to this before > Nov > >

Re: Mesos V1 Operator HTTP API - Java Proto Classes

2016-11-16 Thread Joseph Wu
Added. Welcome to the contributors list :) On Wed, Nov 16, 2016 at 9:49 AM, Vijay Srinivasaraghavan < vijikar...@yahoo.com> wrote: > I have created a JIRA and will submit a patch. Could someone please add me > to the contributor list as I am not able to assign the JIRA to myself? > >

Re: Two questions about running spark on mesos

2016-11-14 Thread Joseph Wu
1) You should read through this page: https://spark.apache.org/docs/latest/running-on-mesos.html I (Mesos person) can't answer any questions that aren't already answered on that page :) 2) Your normal spark commands (whatever they are) should still work regardless of the backend. On Mon, Nov 14,

Re: 答复: Mesos Documentation Project

2016-11-09 Thread Joseph Wu
lopers, and Contributors: > > My name is James Neiman. I have been working with Benjamin Hindman, Artem > Harutyunyan, Neil Conway, and Joseph Wu on improving the Mesos > documentation. We now have a proposal for the community to critique. > > Our goal is to satisfy the needs of

Re: 0.28.3 release dashboard!

2016-11-07 Thread Joseph Wu
Thanks for the suggestions Benjamin! I've re-purposed one of the dashboard queries to track "Issues affecting 0.28.x that are resolved in versions later than 0.28". https://issues.apache.org/jira/issues/?filter=12338701 ^ That will show up on the dashboard too. There are 26 issues in that list,

0.28.3 release dashboard!

2016-11-03 Thread Joseph Wu
Hi everyone! Anand and I will be the Release Managers for 0.28.3! We are planning to cut this patch release within three workdays - that would be around Monday next week. So, if you have any patches that need to get into 0.28.3 make sure that either it is already in the 0.28.x branch or the

Re: Please add me as a contributor

2016-11-01 Thread Joseph Wu
Added! On Tue, Nov 1, 2016 at 1:25 PM, Steven Locke wrote: > Hello, > > Please add me as a Mesos contributor to enable being assigned Jira issues. > > I signed up for Reviewboard and Jira as "slocke". > > Thanks, > Steven > > -- > Steven Locke > Software Engineering Intern

Re: Need inputs on running MPI jobs on Mesos

2016-10-14 Thread Joseph Wu
Other than test frameworks or frameworks Mesos considers part of its CLI, there shouldn't be any other Frameworks that are part of the Mesos codebase. (Imagine shipping Spark or Marathon or a bunch of other humongous frameworks along with Mesos.) Same thing goes for MPI, which may or may not

Re: Maintenance API question

2016-08-31 Thread Joseph Wu
The maintenance endpoints do not reject any "machine_ids". They only reject ones that are formatted wrong or are missing fields. On Wed, Aug 31, 2016 at 8:44 AM, Olivier Sallou <olivier.sal...@irisa.fr> wrote: > > > - Mail original - > > De: "Jos

Re: Maintenance API question

2016-08-31 Thread Joseph Wu
Most likely, the hostname and IP you've put into the "machine_Ids" does not *exactly match* the hostname and IP the agent is identifying itself as. If in doubt, you can check the master's /slaves endpoint. Or, you can manually set the hostname and IP when starting the agent. On Wed, Aug 31,

Re: Protobuf long number JSON serialisation

2016-08-04 Thread Joseph Wu
This is not necessarily a bug, but I think we can safely extend our parsing code to handle this case. This is the method that would need to change: https://github.com/apache/mesos/blob/e859d3ae8d8ff7349327b9e6a89edd6f98d2b7a1/3rdparty/stout/include/stout/protobuf.hpp#L433-L435 On Thu, Aug 4,

Re: Metrics for custom modules

2016-07-13 Thread Joseph Wu
As long as you're using libprocess to write your modules, you can add your metrics via `process::metrics::add(...)`. Those will be exposed via the same old `/metrics/snapshot` endpoint. On Wed, Jul 13, 2016 at 5:39 PM, Zhitao Li wrote: > Hi, > > I'm not sure whether this

Re: [Replicated Log] Enable Mesos to use etcd for replicated_log

2016-07-11 Thread Joseph Wu
decides the "coordinator" of > the > > > replicated log. > > > > > > IINM master contender/detector is not related to replicated logs. The > only > > thing they have in common (when using zookeeper) is they both get the > > zookeeper servers list

Re: [Replicated Log] Enable Mesos to use etcd for replicated_log

2016-07-08 Thread Joseph Wu
Jay, (1) Looks like we missed this when we modularized the MasterDetector/Contender [1]. We need to expand on src/master/main.cpp a bit. Can you file a bug? (cc: Kapil) I can shepherd if Kapil doesn't have the cycles. (2) The bit of the replicated log which relies on ZK is a small portion

Re: [Action Required] Stale Reviews

2016-07-06 Thread Joseph Wu
On a related note, we will also be looking at the (usually neglected) GitHub PRs. We've accumulated ~50 of them over time. After making a quick scan of the list, it turns out we can close a majority of these PRs by either directly closing the non-issues, or by committing the small documentation

Re: Protobuf syntax version for Mesos

2016-06-13 Thread Joseph Wu
Looks like that is a warning in v3, see [1]. The same code in v2.6.1 is [2], and does not have that warning. [1] https://github.com/google/protobuf/blob/088c5c491e7a1c95c7b8eb55f119a8a999c81dc1/src/google/protobuf/compiler/parser.cc#L547-L550 [2]

Re: Code Quality Improvements for docker-compose-executor

2016-06-13 Thread Joseph Wu
I'm not sure what the community adoption of the docker-compose-executor [1] is, but from a Mesos perspective, the repo will eventually be superceded by "pod" support in Mesos itself [2] [3]. Also, you should try to contact the developers of docker-compose-executor itself, as they might not be

Re: [Tech-debt] Introduce regex into Mesos

2016-06-10 Thread Joseph Wu
Same here. Mesos currently requires GCC 4.8.1+. Regex support was implemented in GCC 4.9.0, see [1]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631 On Fri, Jun 10, 2016 at 11:39 AM, Kevin Klues wrote: > By compiler errors, I mean "internal compiler errors" > > On

Re: Round robin DNS zookeeper record

2016-05-25 Thread Joseph Wu
Mesos passes the list inside between "zk://" and the first "/" directly into Zookeeper's C bindings. I'm not familiar enough with the Zookeeper API to say for certain, but it looks like this *does* support your round-robin scheme. You can double check here:

Re: 1.0 Release Candidate

2016-05-25 Thread Joseph Wu
I'm guessing you mean the "medium term" bullet point on the Roadmap ( https://cwiki.apache.org/confluence/display/MESOS/Roadmap): > >- Deprecate Docker containerizer (in favor of Unified containerizer w/ >Docker support) > > This was never meant to be done as part of the 1.0 release. I'm

Re: 答复: mesos-logrotate-logger binary package problem?

2016-05-17 Thread Joseph Wu
The particular implementation of the container logger packaged with Mesos does not need either option (but it shouldn't break with either option either). "create" is not necessary because the container logger will create the log file when it's missing.

Re: [RESULT][VOTE] Release Apache Mesos 0.27.2 (rc1)

2016-03-19 Thread Joseph Wu
Cong Wang, The tags are sync'd. See: https://github.com/apache/mesos/releases You might not have done: git pull --tags On Wed, Mar 16, 2016 at 11:49 AM, Cong Wang wrote: > On Mon, Mar 7, 2016 at 8:29 PM, Michael Park wrote: > > Please find the

Re: Recent changes to MesosTest helpers

2016-03-19 Thread Joseph Wu
in the tests with ASSERT_*.) On Wed, Mar 16, 2016 at 8:27 AM, haosdent <haosd...@gmail.com> wrote: > Does it exit like segment when CHECK_xxx failed? Or exit until finish all > test cases? > On Mar 16, 2016 11:03 PM, "Joseph Wu" <jos...@mesosphere.io> wrote:

Recent changes to MesosTest helpers

2016-03-18 Thread Joseph Wu
Hello Devs & Contributors, We recently committed a refactor of the MesosTest suite and underlying "Cluster" abstraction. This affects almost every existing test and future test, so here's a summary of what has changed and what you should be aware of: - The purpose of the refactor is to make

Re: [VOTE] Release Apache Mesos 0.28.0 (rc1)

2016-03-08 Thread Joseph Wu
If we're re-cutting the release, can we also add this fix for maintenance? (still under review) https://reviews.apache.org/r/44258/ On Tue, Mar 8, 2016 at 2:43 PM, Kevin Klues wrote: > Here are the list of reviews/patches that have been called out in this > thread for

[Proposal] Unified logging for containerizers & the external containerizer

2015-12-11 Thread Joseph Wu
Hello All, As part of the work on managing the logs for executors and tasks, we're introducing a "ContainerLogger" module. This module will allow the stdout/stderr of executors and tasks to be managed or redirected. (Existing executor/task logs are written to plain files.) For example: -

[Breaking bug fix] Binary in state endpoints

2015-10-23 Thread Joseph Wu
Hello, The state endpoints, on master and agent, currently serialize two binary data fields in the ExecutorInfo and TaskInfo objects. These fields are set by frameworks; and Mesos does not inspect their values. The data fields can be found in the state JSON blobs: /master/state ->

Re: RFC: license headers interfere with doxygen documentation (MESOS-3581)

2015-10-20 Thread Joseph Wu
+/- 0 (a) wouldn't hurt, but isn't the best solution. I'd vote for adding actual comment blocks to each class. Doxygen takes the comment block immediately preceding the class and uses that as the description. This means a file like this would show up correctly on Doxygen: /** * License ...

Re: Patch for the website's Rakefile

2015-10-02 Thread Joseph Wu
Dave, Would it be possible for you to take a look at the patches in MESOS-3183 <https://issues.apache.org/jira/browse/MESOS-3183>? Ideally, we should fix the documentation before 0.25 goes out. Thanks, ~Joseph On Mon, Sep 28, 2015 at 1:59 PM, Joseph Wu <jos...@mesosphere.io> wro

Re: Patch for the website's Rakefile

2015-09-28 Thread Joseph Wu
+ Dev On Mon, Sep 28, 2015 at 1:56 PM, Dave Lester <dles...@twitter.com> wrote: > Can this be discussed on the mailing list? Thanks > > > On Monday, September 28, 2015, Joseph Wu <jos...@mesosphere.io> wrote: > >> + Niq, Joris, MPark (so that this doesn