Website update frequency

2015-11-10 Thread Neil Conway
Does anyone know how frequently the docs at mesos.apache.org are updated? I notice that some docs changes from > 1 week ago aren't reflected on the current site. Neil

Re: libprocess: ordered message delivery

2015-11-15 Thread Neil Conway
Good point -- following what Erlang and Akka provide (ordering but not reliability) is probably a reasonable starting-point. I suggested earlier that this would require doing our own retransmission logic (which would be pretty unfortunate), but on reflection that is obviously not true. We can

libprocess: ordered message delivery

2015-11-15 Thread Neil Conway
Hi folks, We should clarify $SUBJECT. My understanding of the current situation is: (1) For local messages (dispatch(), send() to a local process), ordered delivery is guaranteed. (2) For remote messages, ordered delivery is *not* guaranteed. (3) Despite #2, in many cases messages from one

Release / deprecation policy

2015-11-16 Thread Neil Conway
Folks, In the last community sync, we briefly discussed Mesos release policy. In particular, we talked about the current cadence of ~monthly releases and how that relates to (a) deprecation periods (b) support for running a "mixed version" cluster. As I understand it, the current policy is as

Re: Better error reporting in Mesos

2015-11-06 Thread Neil Conway
Hi Marco, On Fri, Nov 6, 2015 at 10:57 AM, Marco Massenzio wrote: > So, in addition to what Michael suggests, and in line with Alex's > exhortation to have "excellent core error reporting facilities," I would > also like to propose that we consider adopting a pattern I've

Re: Better error reporting in Mesos

2015-11-06 Thread Neil Conway
On Fri, Nov 6, 2015 at 12:44 PM, Marco Massenzio wrote: > I may be wrong here, but adding (filename, line_nr) to the Error() object > is performance-impact and may need to be disabled in Production > environments I don't think that will be the case: passing __FILE__ and

Re: mesos installation

2015-10-20 Thread Neil Conway
Hi Soheila, What distribution, version, and CPU architecture are you using? Thanks, Neil On Tue, Oct 20, 2015 at 9:00 AM, Soheila Dehghanzadeh wrote: > Hi All, > > I have been trying to re-install mesos using this link [1] but my make keep > failing because of this error >

Re: mesos installation

2015-10-20 Thread Neil Conway
source code? > > -Soheila > > On Tue, Oct 20, 2015 at 5:04 PM, Neil Conway <neil.con...@gmail.com> wrote: > >> Hi Soheila, >> >> What distribution, version, and CPU architecture are you using? >> >> Thanks, >> Neil >> >> On Tue, Oct 20,

Re: Proposing a deterministic simulation tool for Mesos master and allocator debugging and testing

2015-10-05 Thread Neil Conway
On Sun, Oct 4, 2015 at 6:14 PM, Maged Michael wrote: > I'd appreciate feedback on a proposal for a simulation tool for debugging > and testing the Mesos master and allocator. Overall, this is awesome! I'd love to see Mesos improve in this area, and I'd be happy to help

Re: Proposing a deterministic simulation tool for Mesos master and allocator debugging and testing

2015-10-05 Thread Neil Conway
On Mon, Oct 5, 2015 at 3:20 PM, Maged Michael wrote: > I have in mind three options. > (1) Text translation of Mesos source code. E.g., "process::Future" > into, say, "sim::process::Future". > - Pros: Does not require any changes to any Mesos or libprocess code. > Replace

Request for shepherds

2015-08-28 Thread Neil Conway
Hi everyone, I would love code review/shepherding for the following diffs: https://reviews.apache.org/r/37903/ https://reviews.apache.org/r/37876/ https://reviews.apache.org/r/37877/ https://reviews.apache.org/r/37878/ https://reviews.apache.org/r/37824/ https://reviews.apache.org/r/37823/

Re: Do we have document on HTTP endpoints?

2015-12-01 Thread Neil Conway
Hi Klaus, This would be a great ticket to work on if you're interested in contributing to Mesos :) Best, Neil On Tue, Dec 1, 2015 at 10:56 PM, Klaus Ma wrote: > MESOS-3831 is the same requirement; waiting for it :). > > > Da (Klaus), Ma (马达) | PMP® | Advisory

Re: Speed up Mesos tests

2015-12-16 Thread Neil Conway
+1 on the speed-up-the-tests project! On Wed, Dec 16, 2015 at 10:29 AM, Greg Mann wrote: > I'd like to bring up something that both Neil and Joseph mentioned to me > recently, which could be of use when working on these slow test tickets. > Since we have the `process::Clock`

Re: Mesos build & testing environment instructions

2015-12-17 Thread Neil Conway
+1 to the general idea of including this information in the documentation. I'd probably lean towards including this information in the current "Getting Started" page, but in a separate section ("Running The Test Suite"?). Neil On Thu, Dec 17, 2015 at 12:38 PM, Greg Mann

Allocator API changes

2015-12-10 Thread Neil Conway
Hi everyone, The allocator API [1] is going to change in the forthcoming 0.26 release [2]. Custom allocators will need to implement several new API methods. Further changes to the allocator API are being contemplated for the 0.27 release [3]. If you have built a custom allocator, please speak

Re: [MESOS-1865] Redirect to the leader master when current master is not a leader.

2016-01-08 Thread Neil Conway
On Fri, Jan 8, 2016 at 12:29 PM, Benjamin Mahler wrote: > (2) It is difficult to reliably obtain cluster state through the existing > endpoints. This one is less clear to me than the first problem. Here we > have to think through how we want users to be hitting state

Re: [MESOS-1865] Redirect to the leader master when current master is not a leader.

2016-01-06 Thread Neil Conway
+1 -- I think we should make this change. The current behavior is quite dangerous. Neil On Wed, Jan 6, 2016 at 12:52 PM, Diogo Gomes wrote: > Hi, Adam and Haosdent > > > Resurrecting this issue, https://issues.apache.org/jira/browse/MESOS-1865, I > would like to make a +1

Re: No master is currently leading ...

2016-01-06 Thread Neil Conway
Hi, Can you post the full logs from all of the master instances? BTW, the @dev list is mostly intended for discussion around the development of Mesos. The @user list is a better venue for user support/configuration questions. Thanks, Neil On Wed, Jan 6, 2016 at 12:58 PM, DiGiorgio, Mr.

Re: Release / deprecation policy

2015-11-25 Thread Neil Conway
gt; of course, it might in the future). > > Hence, > +1 > in providing tooling to make cluster upgrades easier to automate. > > Thanks! > > -- > *Marco Massenzio* > Distributed Systems Engineer > http://codetrips.com > > On Mon, Nov 16, 2015 at 9:24 PM, Neil C

Re: Fw: Re: Dynamic vs. implicit roles

2015-11-30 Thread Neil Conway
On Mon, Nov 30, 2015 at 6:53 PM, YongQiao Wang wrote: >> 1. Choosing a role name >> 2. Configuring weights, ACLs, and quotas for the role. >> 3. Configuring applications/frameworks to register using that role. > > [Yong Qiao] If applications/frameworks do not follow your

Re: Dynamic vs. implicit roles

2015-11-30 Thread Neil Conway
Hi Klaus, Thanks for your feedback. On Mon, Nov 30, 2015 at 10:01 PM, Klaus Ma wrote: > @Neil, just want to confirm about ACL, do you mean we will load role info > from 3rd part application, e.g. LDAP? I mean ACLs as in the authorization subsystem in Mesos:

Re: Images missing from documentation?

2015-11-18 Thread Neil Conway
Hi Ravi, Thanks for the report! This is tracked as https://issues.apache.org/jira/browse/MESOS-3183. Neil On Wed, Nov 18, 2015 at 9:42 AM, Ravi Prakash wrote: > Hi folks! > Seems like the images are missing from the generated documentation. e.g. >

Re: [1/2] mesos git commit: Added aufs provisioning backend.

2016-06-08 Thread Neil Conway
Can you update the documentation for this change, please? Thanks, Neil On Tue, Jun 7, 2016 at 6:14 PM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 90871a48f -> e5358ed1c > > > Added aufs provisioning backend. > > Review:

Re: Does anyone know the MESOS-4675 is back-ported to 0.25?

2016-06-08 Thread Neil Conway
Done -- https://issues.apache.org/jira/browse/MESOS-5569 On Wed, Jun 8, 2016 at 2:43 PM, Vinod Kone <vinodk...@gmail.com> wrote: > +1. Cab you file a jira? > > @vinodkone > >> On Jun 8, 2016, at 7:58 AM, Neil Conway <neil.con...@gmail.com> wrote: >> >>

Re: Does anyone know the MESOS-4675 is back-ported to 0.25?

2016-06-08 Thread Neil Conway
It would be great to make this information more prominent on the website, especially once 1.0.0 is released. For example, we could list the supported releases on https://mesos.apache.org/downloads/, along with a link to the versioning document. Neil On Tue, Jun 7, 2016 at 6:58 PM, Vinod Kone

Re: mesos git commit: Added documentation for access_sandboxes and access_mesos_logs acls.

2016-06-06 Thread Neil Conway
FYI, this commit should have included the changes produced by re-running the `generate-endpoint.py` script. Neil On Wed, Jun 1, 2016 at 8:26 AM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 5263a6211 -> 53b5164bb > > > Added documentation for

Re: Master configuration in the registry

2016-06-10 Thread Neil Conway
Makes sense: arguably you could say that "quota" and "weights" are part of the master's (mutable) "state", not its "configuration", which is largely immutable. Another distinction is that some configuration flags control behavior that doesn't need to be consistent between master replicas (e.g.,

Blog posts for 0.28.1, 0.28.2 releases?

2016-06-10 Thread Neil Conway
Folks, It seems like https://mesos.apache.org/blog/ doesn't have blog posts for the Mesos 0.28.1 or 0.28.2 releases. We generally try to have a blog post for each release, right? Neil

Re: WebUI authentication in 1.0.0-rc1

2016-06-08 Thread Neil Conway
On Wed, Jun 8, 2016 at 4:27 PM, Alexander Rojas wrote: > I think we should also think more thoroughly about the expected behaviour > when we introduce new authorizable actions (and we most certainly will). > Since things may break particularly if users set the

Re: source code compile failure mesos-0.28.0

2016-06-21 Thread Neil Conway
Can you post the content of "config.log"? Thanks, Neil On Tue, Jun 21, 2016 at 3:17 PM, Ali Aktar wrote: > Hi; > > All dependencies as per doc were installed. I’m using Centos 7: > Linux ip-172-31-46-249.eu-west-1.compute.internal 3.10.0-327.10.1.el7.x86_64 > #1 SMP

Improving support for partitioned tasks

2016-06-20 Thread Neil Conway
Currently, Mesos implements a hardcoded policy for handling partitioned agents and tasks: * agents are deemed to be partitioned when they fail health checks (~75 seconds by default) * partitioned agents are removed from the cluster. Frameworks receive TASK_LOST for all tasks running on the

Re: Links in documentation

2016-01-14 Thread Neil Conway
On Thu, Jan 14, 2016 at 11:39 AM, Joris Van Remoortere wrote: >> *In fact it seems that all links ending with .md are interpreted as >> relative links on the webpage, i.e. [label](https://test.com/foo.md) is >> rendered into https://test.com/foo/ >> ">label. > > I think this

Re: [2/2] mesos git commit: Added documentation for labeled reserved resources.

2016-02-12 Thread Neil Conway
Hi Ben, On Fri, Feb 12, 2016 at 2:34 AM, Benjamin Mahler wrote: > Any plans to support labels for static reservations? > > Are we intentionally not supporting ReservationInfo for static > reservations? Or is this just outside of the initial scope? Labels for static

Re: Shepherd for MESOS-3486

2016-02-12 Thread Neil Conway
Hi Michael, Thanks for taking this on! Joris Van Remoortere will shepherd, but please also include me in the review request. Thanks, Neil On Thu, Feb 11, 2016 at 5:21 PM, Michael Browning wrote: > Hello, > > I've picked a small issue off of the newbie issues stack to

Re: Questions about release process

2016-02-04 Thread Neil Conway
freebsd.hpp is missing from the 0.27 release tarball, presumably because 3rdparty/libprocess/3rdparty/stout/include/Makefile.am was not updated to account for it. I'll send an RR shortly. Neil On Thu, Feb 4, 2016 at 10:16 AM, haosdent wrote: > Hi, David. I could saw you

Precision of scalar resources

2016-02-12 Thread Neil Conway
tl;dr: If you use resource values with more than three decimal digits of precision (e.g., you are launching a task that uses 2.5001 CPUs), please speak up! Mesos uses floating point to represent scalar resource values, such as the number of CPUs in a resource offer or dynamic reservation.

Re: Version numbers in docs

2016-02-02 Thread Neil Conway
I agree we should remove this text after some period of time has passed since that version of Mesos was released; it is quite distracting. The proper long-term fix is probably to have version-specific docs. So all the documentation at /documentation/v0.27/... would implicitly discuss the

Re: [VOTE] Release Apache Mesos 0.27.2 (rc1)

2016-02-29 Thread Neil Conway
As described (briefly) in the release emails, 0.27.2, 0.26.1, 0.25.1, and 0.24.2 contains a new feature: "reliable floating point for scalar resources" (MESOS-4687). To elaborate on that slightly, Mesos now only supports scalar resource values with three decimal digits of precision (e.g.,

Re: mesos git commit: Added recommendations for programming with persistent volumes.

2016-01-19 Thread Neil Conway
os/asf/mesos/commit/e2963966 >> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/e2963966 >> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/e2963966 >> >> Branch: refs/heads/master >> Commit: e2963966acc5c2263849ef183c9ee57251102d0e >> Paren

Re: Follow up on the proposal for simulation tools for master and allocator

2016-01-21 Thread Neil Conway
Hi Zhitao, There's a JIRA here: https://issues.apache.org/jira/browse/MESOS-3855 A few people who are interested in simulation of Mesos have been meeting periodically, although due to the holidays we haven't had a meeting in a little bit. I'll make sure you're included in the next meeting when

Enable compiler optimization by default?

2016-02-17 Thread Neil Conway
Hi folks, At present, Mesos defaults to compiling with "-O0"; to enable compiler optimizations, the user needs to specify "--enable-optimize". I'd like to propose we change the default, for a few reasons: (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally, I think most

Re: Enable compiler optimization by default?

2016-02-17 Thread Neil Conway
On Wed, Feb 17, 2016 at 5:07 PM, Zameer Manji wrote: > Can't this problem also be solved by distributing packages that have > optimized binaries? The individuals/organizations that build packaged versions of Mesos should ensure that compiler optimizations are enabled -- but I

Re: Enable compiler optimization by default?

2016-02-18 Thread Neil Conway
t;> >> So our CI will also update to use optimisation flags, right? We need to >> highlight this in upgrade document to our user; I used to meet so strange >> behaviour after changing -O level. >> >> On Thu, Feb 18, 2016 at 8:51 AM James DeFelice <james.defel...@gmail

Re: mesos git commit: Add 'name' field into NetworkInfo.

2016-03-10 Thread Neil Conway
Should we also update docs/networking-for-mesos-managed-containers.md? It contains a version of the NetworkInfo message definition. Neil On Thu, Mar 10, 2016 at 11:05 AM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 57a574fc9 -> 2a436e02f > > > Add

Re: [1/6] mesos git commit: Fixed a memory leak in the scheduler driver.

2016-03-30 Thread Neil Conway
t;> Refer to this doc for the detail of deleting null: >> http://www.cplusplus.com/reference/new/operator%20delete/ < >> http://www.cplusplus.com/reference/new/operator%20delete/> >> >> Thanks >> Klaus >> >> > On Mar 30, 2016, at 07:24

Re: [1/6] mesos git commit: Fixed a memory leak in the scheduler driver.

2016-03-30 Thread Neil Conway
On Wed, Mar 30, 2016 at 4:57 PM, Benjamin Mahler wrote: > Yikes! (3) being not true to me means that I needed non-local reasoning to > determine the optionality. Sorry: to clarify, I didn't mean "there is not always a latch" in the code in question. I meant: "writing 'delete

Re: Design doc: ordered delivery in libprocess

2016-04-11 Thread Neil Conway
ections from a given libprocess instance is the "newest". Neil On Fri, Mar 25, 2016 at 12:50 PM, Neil Conway <neil.con...@gmail.com> wrote: > A few months ago, there was a dev list thread on whether libprocess > should provide ordered delivery [1]. The consensus then was that

Re: Looking for Shepherd for MESOS-5002

2016-03-22 Thread Neil Conway
Sure, I'd be happy to review the change. Neil On Tue, Mar 22, 2016 at 9:01 AM, Jie Yu wrote: > + Neil > > Neil is driving the documentation improvement in Mesos. Neil, do you have > time for that? I can help commit the patch if you give a shipit. > > - Jie > > On Tue, Mar

Design doc: ordered delivery in libprocess

2016-03-25 Thread Neil Conway
A few months ago, there was a dev list thread on whether libprocess should provide ordered delivery [1]. The consensus then was that libprocess doesn't provide ordered delivery in a few corner cases, but that we should fix that behavior to guarantee ordered (but unreliable) message delivery.

Re: Making 'curl' a prerequisite for installing Mesos

2016-03-03 Thread Neil Conway
No objection to about the additional dependency, but using 'curl' instead of 'libcurl' seems unfortunate. Can you share some more detailed information about the problems that have been encountered using libcurl? e.g., was using the curl_multi_xxx() APIs explored? Neil On Thu, Mar 3, 2016 at 9:10

Re: Need CHANGELOG updates

2016-03-03 Thread Neil Conway
I sent https://reviews.apache.org/r/44348/ for the floating point math changes; if you'd prefer a different format or more/less details, just let me know. Thanks, Neil On Thu, Mar 3, 2016 at 10:57 AM, Vinod Kone wrote: > Hi guys, > > The 0.28.0 release is currently blocked

Re: Discussion about upgrading 3rdparty libraries

2016-03-01 Thread Neil Conway
The prospect of downloading dependencies from "rando" locations is concerning to me :) Mesos can easily come to depend on implementation details of a dependency that might change in a minor release. For example, a recent change [1] depends on the connection retry logic in the Zk client library in

Re: [1/6] mesos git commit: Fixed a memory leak in the scheduler driver.

2016-03-29 Thread Neil Conway
On Tue, Mar 29, 2016 at 7:19 PM, wrote: > --- a/src/sched/sched.cpp > +++ b/src/sched/sched.cpp > @@ -1808,6 +1808,10 @@ MesosSchedulerDriver::~MesosSchedulerDriver() > delete process; >} > > + if (credential != NULL) { > +delete credential; > + } `delete`

Re: mesos git commit: Removed deprecated annotation for values in a protobuf enum.

2016-05-19 Thread Neil Conway
Do we need to be source-compatible with protobuf 2.5? If so, why? Neil On Wed, May 18, 2016 at 11:15 PM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master b7e50fe8b -> 4248b3c3a > > > Removed deprecated annotation for values in a protobuf enum. > > Support

Re: mesos git commit: Fixed a head-of-line blocking bug in libevent SSL socket.

2016-05-12 Thread Neil Conway
Would it be possible to write a unit test that reproduces the original problem? It should be pretty easy to repro, right? Neil On Thu, May 12, 2016 at 1:50 AM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 95e670cd4 -> 28c085fca > > > Fixed a

Re: mesos website workgroup

2016-05-17 Thread Neil Conway
Count me in. Thanks, Neil On Tue, May 17, 2016 at 7:54 AM, Tomek Janiszewski wrote: > Count me in. > > Tomek > > wt., 17.05.2016, 07:49 użytkownik Abhishek Dasgupta < > a10gu...@linux.vnet.ibm.com> napisał: > >> I would be very much interested. I have some front-end

Re: mesos git commit: Updated quota endpoint help.

2016-05-18 Thread Neil Conway
When modifying the endpoint help text, we should remember to update the generated help files (via support/generate-endpoint-help.py) -- the changes to both the input text and generated output files should be included as part of the same commit. Neil On Wed, May 18, 2016 at 10:58 AM,

Re: Design doc for TASK_GONE

2016-05-12 Thread Neil Conway
/document/d/1D2mJnwuC1qlT_SJGspfj4MdAQXflESCqKANY0Pj4644 Neil On Mon, May 9, 2016 at 2:37 PM, Neil Conway <neil.con...@gmail.com> wrote: > Hi folks, > > To address some shortcomings and ambiguities in the TASK_LOST task > state, I'd like to propose that we introduce a new task

Re: mesos git commit: Replaced CHECK with CHECK_READY.

2016-05-10 Thread Neil Conway
t;> Also removes some unused header includes. >>> >>> Review: https://reviews.apache.org/r/46827/ >>> >>> >>> Project: http://git-wip-us.apache.org/repos/asf/mesos/repo >>> Commit: http://git-wip-u

Design doc for TASK_GONE

2016-05-09 Thread Neil Conway
Hi folks, To address some shortcomings and ambiguities in the TASK_LOST task state, I'd like to propose that we introduce a new task state, TASK_GONE. For more information, see the design doc: https://issues.apache.org/jira/browse/MESOS-5345 Comments welcome! Neil

Re: getting added to contributors

2016-07-12 Thread Neil Conway
Do we really want everyone who wants to be assigned a JIRA to also add themselves to the YAML file? To me, this adds another step to a contribution process that probably has too many steps already. Neil On Mon, Jul 11, 2016 at 7:31 PM, Vinod Kone wrote: > Welcome to the

Re: Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-30 Thread Neil Conway
Hi Ben, Thanks for the feedback! Seems like we're on the same page overall. On Thu, Jul 28, 2016 at 8:42 AM, Benjamin Mahler wrote: > It seems to me that these particular flags are not applicable for > PARTITION_AWARE frameworks, since there is no removal occurring. FWIW,

Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-27 Thread Neil Conway
Hi folks, There are two "safety limits" in place that control the master's agent removal behavior: (1) "--agent_removal_rate_limit" controls the rate at which agents can be removed from the cluster when they fail health checks. (2) "--recovery_agent_removal_limit" controls the fraction of

Re: getting added to contributors

2016-07-13 Thread Neil Conway
rs group in JIRA (no changes here), or they can > instead submit a PR to contributors.yaml file (just specifying the email > and JIRA handle should be sufficient) which will result in the same thing. > > We will update contribution guidelines to make this explict. > > Artem. > &g

Disabling the --registry_strict flag in 1.0

2016-07-12 Thread Neil Conway
Hi folks, I'd like to propose that we disable the --registry_strict flag for 1.0. You can find the rationale for this change here: https://issues.apache.org/jira/browse/MESOS-5833 Please let me know if you have any thoughts on whether we should make this change. Thanks, Neil

Re: Registering and framework failover

2016-07-13 Thread Neil Conway
as, is it possible to do this? > > (also, we actually use a failover timeout of 1 week, but it doesn't > really change the problem and I mistakenly assumed that an example with > smaller values would be more intuitive) > > On 13.07.2016 14:50, Neil Conway wrote: >> On Wed

Re: Registering and framework failover

2016-07-13 Thread Neil Conway
On Wed, Jul 13, 2016 at 2:44 PM, Evers Benno wrote: > imagine the following situation: I am a framework with failover timeout > of 1 hour, and 59 minutes and 55 seconds after shutting down I want to > register with the master again. > > If my registration attempt arrives at

Re: Disabling the --registry_strict flag in 1.0

2016-07-13 Thread Neil Conway
Hi Jie, On Wed, Jul 13, 2016 at 12:11 AM, Jie Yu wrote: > Does this mean that we'll have to cut another 1.0 RC just for that? I'd think so. > If we were to cut another RC (e.g., due to bugs, which is likely), I would > be happy to include the patch that disables the flag

Re: [3/4] mesos git commit: Added filtering for orphaned tasks in /state endpoint.

2016-07-06 Thread Neil Conway
On Wed, Jul 6, 2016 at 12:06 AM, wrote: > diff --git a/src/master/http.cpp b/src/master/http.cpp > index 6b4f85b..debedd4 100644 > --- a/src/master/http.cpp > +++ b/src/master/http.cpp > @@ -2498,11 +2498,8 @@ Future Master::Http::state( > }); > > //

RFC: partitioned tasks and the strict registry

2016-07-11 Thread Neil Conway
Folks, We're working on some Mesos features that will allow frameworks to control how partitioned tasks are handled [1]. As part of designing how this will work, I'd love to hear from users and framework developers about they handle partitioned tasks/agents. Specifically: (a) Have you enabled

Overloading and function names

2016-07-01 Thread Neil Conway
Consider the following function signatures from master.cpp: Nothing Master::removeSlave(const Registry::Slave& slave); void Master::removeSlave(Slave* slave, const string& message, Option reason); or these from sorter/drf/sorter.hpp: void update(const SlaveID& slaveId, const Resources&

Re: Overloading and function names

2016-07-04 Thread Neil Conway
On Sun, Jul 3, 2016 at 9:10 PM, Benjamin Mahler wrote: > To clarify, are you ok with the removeSlave example? It seems to fit your > criteria. I think `removeSlave` is poorly named, for similar reasons -- I just talked about `update` in my email for brevity. > Usually with

Re: Tracking deprecated features

2017-02-07 Thread Neil Conway
Strongly agree that this can and should be improved! Two questions/suggestions: (1) Should we use JIRA, the website/docs, or both? If we only use JIRA, it might not be obvious to users that, e.g., the "--roles" master flag is deprecated. An alternative would be a table in the docs, listing (a)

Re: Build failed in Jenkins: Mesos-Buildbot » autotools,gcc,--verbose,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2) #3220

2017-02-06 Thread Neil Conway
I haven't seen this test fail elsewhere, but there's at least one other instance of it failing on ASF CI. Unfortunately I couldn't fetch the logs in either case (any chance we change the ASF Jenkins configuration to keep logs for failing jobs for longer?). I'll keep an eye out to see if I can get

Re: Welcome Neil Conway as Mesos Committer and PMC member!

2017-01-22 Thread Neil Conway
017 at 11:03 PM, Vinod Kone <vinodk...@apache.org> wrote: > >> Hi folks, >> >> Please welcome Neil Conway as the newest committer and PMC member of the >> Apache Mesos project. >> >> Neil has been an active contributor to Mesos for more than a year now. As

Re: Disallowing pre-1.0 Mesos agents

2017-01-23 Thread Neil Conway
t; >> +1 >> >> Technically 0.28.0 was only supposed to be compatible with 0.27.0 and 1.0. >> >> >> On Fri, Jan 20, 2017 at 8:02 PM, Zameer Manji <zma...@apache.org> wrote: >> >> > +1 >> > >> > >> > >> > On Fri

Re: Proposal for Mesos Build Improvements

2017-02-14 Thread Neil Conway
I'm curious to hear more about how using PCH compares with making stout a non-header-only library. Is PCH easier to implement, or is it expected to offer a more dramatic improvement in compile times? Would making both changes eventually make sense? Neil On Tue, Feb 14, 2017 at 11:28 AM, Jeff

Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Neil Conway
On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler wrote: > For efficiency purposes, if a header file is included by 50% or more of the > source files, it should be included in the precompiled header. If a header is > included in fewer than 50% of the source

Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Neil Conway
On Wed, Feb 15, 2017 at 1:59 PM, Jeff Coffler wrote: > 3. Maintaining the correct includes is nice, but not at the cost of compiler > speed. Personally, I would invert these statements -- but until we know the cost of the redundant includes, probably not

Disallowing pre-1.0 Mesos agents

2017-01-20 Thread Neil Conway
I'd like to propose that the Mesos 1.3.0 should not allow pre-1.0 Mesos agents to register. Motivation: (1) We can simplify the master code in a few places. For example, we can assume that we always have a FrameworkInfo for any task running on a registered agent. Needing to handle running tasks

Re: Welcome Kevin Klues as a Mesos Committer and PMC member!

2017-03-01 Thread Neil Conway
Congratulations Kevin! Very well-deserved. Neil On Wed, Mar 1, 2017 at 2:05 PM, Benjamin Mahler wrote: > Hi all, > > Please welcome Kevin Klues as the newest committer and PMC member of the > Apache Mesos project. > > Kevin has been an active contributor in the project for

Fwd: mesos git commit: Fixed a bug in getRootContainerId due to protobuf copying issue.

2016-09-19 Thread Neil Conway
Hi Jie, Do you have more details on what exactly the problem is here? If protobuf is unable to copy/merge nested messages in general, that seems like something that might crop up elsewhere. Perhaps we can (a) file a JIRA (ideally with a self-contained test-case), and/or (c) report the problem to

Re: mesos git commit: Added `DEFAULT_ROLE` constant to persistent volume tests.

2016-09-22 Thread Neil Conway
I'm not sure this is a good idea: the "default role" is actually "*". That is also the default value for the "role" fields in the protobufs. Perhaps we should name this new constant something like DEFAULT_TEST_ROLE? I wonder also if we should keep the definition local to

Re: mesos git commit: Added MESOS-6497 to CHANGELOG.

2016-10-28 Thread Neil Conway
This commit should also appear in the master branch, not just 1.1.x Neil On Fri, Oct 28, 2016 at 4:06 PM, wrote: > Repository: mesos > Updated Branches: > refs/heads/1.1.x bc7ecb8cf -> 7fce1b33f > > > Added MESOS-6497 to CHANGELOG. > > > Project:

Non-checkpointing frameworks

2016-10-14 Thread Neil Conway
Hi folks, I'd like input from individuals who currently use frameworks but do not enable checkpointing. Background: "checkpointing" is a parameter that can be enabled in FrameworkInfo; if enabled, the agent will write the framework pid, executor PIDs, and status updates to disk for any tasks

Re: Non-checkpointing frameworks

2016-10-18 Thread Neil Conway
Hi folks, Thanks for the feedback! On Mon, Oct 17, 2016 at 12:44 PM, Zhitao Li wrote: > +1 to both A to B. > > Do we plan to eventually drop non-checkpionted framework support (possibly > in v2) and declare that all frameworks has to operate in this assumption? I think

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

2016-11-16 Thread Neil Conway
Has there been any response from the ASF Infra folks on addressing the VM/hardware issues? Seems like it will be difficult to get good signal from the ASF CI in the absence of some improvements on the infrastructure side. Neil On Wed, Nov 16, 2016 at 10:45 AM, Alex R wrote: >

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,centos:7,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6) #2852

2016-10-31 Thread Neil Conway
I spent a little while looking into this. The "PersistentVolumeEndpointsTest.OfferCreateThenEndpointRemove" test fails on the following expectations: https://github.com/apache/mesos/blob/1e57459b7d3f571bdf18fec29b070e78ce719319/src/tests/persistent_volume_endpoints_tests.cpp#L1562

Duplicate task IDs

2016-12-09 Thread Neil Conway
Folks, The master stores a cache of metadata about recently completed tasks; for example, this information can be accessed via the "/tasks" HTTP endpoint or the "GET_TASKS" call in the new Operator API. The master currently stores this metadata using a list; this means that duplicate task IDs

Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
erent reason (e.g. performance) for using a hashmap? > > I'm wondering why a multi-hashmap is not sufficient. This would be clear if > you were explicitly *trying* to get rid of duplicates of course :-) > > Thanks, > Joris > > — > *Joris Van Remoortere* > Mesosphere > >

Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
On Mon, Dec 12, 2016 at 1:32 PM, Joris Van Remoortere wrote: > It sounds like using a multi_hashmap for now allows you to clean up the > code and avoid some bugs, without changing the existing behavior. Because we want cache-like behavior (bounded size + LRU replacement),

Re: Building on OS X 10.12

2016-12-12 Thread Neil Conway
I think we should look into adopting "-fvisibility=hidden" and explicitly annotating the symbols that we want to export: https://issues.apache.org/jira/browse/MESOS-6734 Although I agree this isn't a trivial change and it would be good to have some tool support here, but there are lots of

Re: Map support in proto2

2016-12-18 Thread Neil Conway
I believe `oneof` is supported in protobuf 2.6.1 [1], so we wouldn't need to upgrade to make use of it. But I agree that upgrading to protobuf 3 (while continuing to use the proto2 language version) is worth doing at some point. Neil [1]

Requiring XCode >= 8.0 on OSX

2017-04-08 Thread Neil Conway
XCode < 8 does not support the C++11 `thread_local` construct. As a result, we added a workaround to use `__thread` on OSX and `thread_local` on other platforms: https://reviews.apache.org/r/36845/ Since that workaround was added, XCode 8 has been released (in September 2016) with support for

[Design doc] RPC: Fault domains in Mesos

2017-04-17 Thread Neil Conway
Folks, I'd like to enhance Mesos to support a first-class notion of "fault domains" -- i.e., identifying the "rack" and "region" (DC) where a Mesos agent or master is located. The goal is to enable two main features: (1) To make it easier to write "rack-aware" Mesos frameworks that are portable

Re: [Design doc] RPC: Fault domains in Mesos

2017-04-19 Thread Neil Conway
l we could implement this by identifying a fault domain with a simple > list of ids like ["US-WEST-1", "Building 2", "Cage 3", "POD 12", "Rack 3"] > or ["US-EAST-2", "Building 1"]. Slaves would advertise their lowest-le

Re: Time Zone information in TimeInfo

2017-03-08 Thread Neil Conway
ime since the Unix epoch then TZ > info is not useful. > > I think that comment should be removed for clarity. > > On Mon, Mar 6, 2017 at 8:38 PM, Neil Conway <neil.con...@gmail.com> wrote: > >> I always found that TODO confusing. If a `TimeInfo` is intended to >&g

Re: Time Zone information in TimeInfo

2017-03-06 Thread Neil Conway
I always found that TODO confusing. If a `TimeInfo` is intended to represent the amount of time that has elapsed since the (Unix) epoch, I would expect it to be timezone independent. Can you clarify why having TZ info would be useful? Neil On Mon, Mar 6, 2017 at 7:51 PM, Zameer Manji

Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-01 Thread Neil Conway
The perf core dump might be addressed if we backport this change: https://reviews.apache.org/r/56611/ Although my guess is that this isn't a severe problem: for some as-yet-unknown reason, running `perf` on the host segfaulted, which causes the test to fail. Neil On Wed, Mar 1, 2017 at 11:09

Re: Agent reregistration timeout, no TASK_LOST messages

2017-07-17 Thread Neil Conway
On Mon, Jul 17, 2017 at 9:20 AM, Ilya Pronin wrote: > AFAIK the absence of TASK_LOST statuses is expected. Master registry > persists information only about agents. Tasks are recovered from > re-registering agents. Because of that the failed over master can't send >

  1   2   >