Re: Agent reregistration timeout, no TASK_LOST messages

2017-07-17 Thread Neil Conway
On Mon, Jul 17, 2017 at 9:20 AM, Ilya Pronin wrote: > AFAIK the absence of TASK_LOST statuses is expected. Master registry > persists information only about agents. Tasks are recovered from > re-registering agents. Because of that the failed over master can't send >

Re: June 3rd: MesosCon North America CFP due!

2017-06-03 Thread Neil Conway
Hi Jay, The CFP deadline has been extended to June 30. Neil On Sat, Jun 3, 2017 at 12:55 AM Jay Guo wrote: > According to the link CFP for MesosCon North America > > ​ the CFP closes by

Re: RFC: Partition Awareness

2017-06-01 Thread Neil Conway
Hi Ben, The argument for changing the semantics is that correct frameworks should _always_ have accounted for the possibility that TASK_LOST tasks would go back to running (due to the non-strict registry semantics). The proposed change would just increase the probability of this behavior

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-05-31 Thread Neil Conway
On Tue, May 30, 2017 at 3:43 PM, Neil Conway <neil.con...@gmail.com> wrote: > Attached is the test log for this failure. From a quick look, seems as > though the agent starts to launch the task, including forking the > child process, but no subsequent task status updates or

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-05-31 Thread Neil Conway
On Tue, May 30, 2017 at 2:36 PM, Vinod Kone wrote: > Failed test: OneWayPartitionTest.MasterToSlave >

Re: [VOTE] Release Apache Mesos 1.3.0 (rc3)

2017-05-30 Thread Neil Conway
On Tue, May 30, 2017 at 2:36 PM, Vinod Kone wrote: > Ran on ASF CI. > > Found following issues. > > Failed test: CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled >

Re: Moving Mesos builds reqs from GCC 4.8.1+ to GCC 4.9.0+

2017-05-30 Thread Neil Conway
On Tue, May 30, 2017 at 12:58 PM, Michael Park wrote: > I'm all for moving to GCC 4.9+. > > I'd love to get C++14 and bump to GCC 5, but I think we should do an > investigation for "reasonable availability" before we do that. I agree, although I'd think a similar investigation

Re: Moving Mesos builds reqs from GCC 4.8.1+ to GCC 4.9.0+

2017-05-30 Thread Neil Conway
It seems that if we moved to GCC 5, we'd also be able to move to C++14 (https://gcc.gnu.org/projects/cxx-status.html#cxx14). CentOS 6 users will need to install devtoolset anyway (which makes it easy to get GCC 5 or 6), so I wonder if skipping directly to requiring GCC 5 would be feasible? Neil

Re: [VOTE] Release Apache Mesos 1.3.0 (rc2)

2017-05-24 Thread Neil Conway
The vote has failed; we'll cut a new release shortly. The release blocker (MESOS-7521) has been investigated and fixed. The next RC will also include MESOS-7538, as well as the `register_agents` ACL change mentioned in a different thread. Neil On Wed, May 17, 2017 at 3:11 PM, Yan Xu

Re: Welcome Gilbert Song as a new committer and PMC member!

2017-05-24 Thread Neil Conway
Congratulations Gilbert! Well-deserved! Neil On Wed, May 24, 2017 at 10:32 AM, Jie Yu wrote: > Hi folks, > > I' happy to announce that the PMC has voted Gilbert Song as a new committer > and member of PMC for the Apache Mesos project. Please join me to > congratulate him! >

Re: Use of ACLs.RegisterAgent.agent

2017-05-24 Thread Neil Conway
FYI, I merged the change to rename this field into the master and 1.3.x branches; it will be included in the next 1.3.0 release candidate. Neil On Mon, May 22, 2017 at 10:43 AM, Alexander Rojas wrote: > Hey guys, > > We just noted that there was an error when the

Re: mesos git commit: Updated the outdated network isolator configure flag.

2017-05-18 Thread Neil Conway
This commit enables the port mapping isolator by default. Was that intended? Among other things, it breaks the build on OSX: $ ../mesos/configure --disable-java --disable-python [...] configure: error: cannot build network isolator

Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-08 Thread Neil Conway
Personally, I'm not convinced that we need to fix MESOS-7378. The problem is essentially a bug in glibc that was fixed 6 years ago. (As a point of reference, the oldest version of g++ we support was released 2 years ago... :) ) Neil On Mon, May 8, 2017 at 3:45 PM, Yan Xu wrote: > I

Re: Version numbers during development

2017-05-08 Thread Neil Conway
) updating the release process documentation accordingly. If you have any concerns, please let me know. Neil On Fri, May 5, 2017 at 4:29 PM, Neil Conway <neil.con...@gmail.com> wrote: > In my experience, this is reasonably common. For example, Postgres > uses version number

Re: [4/6] mesos git commit: Checked validity of master and agent version numbers on startup.

2017-05-08 Thread Neil Conway
t-wip-us.apache.org/repos/asf/mesos/repo >> Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/5a5dd8a4 >> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/5a5dd8a4 >> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/5a5dd8a4 >> >> Branch: refs/hea

Re: documenting test expactations

2017-05-08 Thread Neil Conway
My two cents: (1) I almost always need to read the test to understand/debug a test failure. (2) As long as the intent of the test is clear, I'm not picky about whether clarifying comments take the form of C++ comments or explanatory EXPECT messages. One thing I would be opposed to is adding

Re: Version numbers during development

2017-05-05 Thread Neil Conway
gt; > On Fri, May 5, 2017 at 1:27 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: > >> +1 >> >> Sent from my iPhone >> >> > On May 5, 2017, at 12:56 PM, Neil Conway <neil.con...@gmail.com> wrote: >> > >> > Our current practice is that

Version numbers during development

2017-05-05 Thread Neil Conway
Our current practice is that when we create a branch for version X, we bump the version number in the "master" branch to X+1. For example, we just created the 1.3.x branch, and bumped the version number in master to "1.4.0". Proposal: we should instead use a version number like "1.4.0-devel" in

Re: [Design doc] RPC: Fault domains in Mesos

2017-04-19 Thread Neil Conway
l we could implement this by identifying a fault domain with a simple > list of ids like ["US-WEST-1", "Building 2", "Cage 3", "POD 12", "Rack 3"] > or ["US-EAST-2", "Building 1"]. Slaves would advertise their lowest-le

[Design doc] RPC: Fault domains in Mesos

2017-04-17 Thread Neil Conway
Folks, I'd like to enhance Mesos to support a first-class notion of "fault domains" -- i.e., identifying the "rack" and "region" (DC) where a Mesos agent or master is located. The goal is to enable two main features: (1) To make it easier to write "rack-aware" Mesos frameworks that are portable

Requiring XCode >= 8.0 on OSX

2017-04-08 Thread Neil Conway
XCode < 8 does not support the C++11 `thread_local` construct. As a result, we added a workaround to use `__thread` on OSX and `thread_local` on other platforms: https://reviews.apache.org/r/36845/ Since that workaround was added, XCode 8 has been released (in September 2016) with support for

Re: Time Zone information in TimeInfo

2017-03-08 Thread Neil Conway
ime since the Unix epoch then TZ > info is not useful. > > I think that comment should be removed for clarity. > > On Mon, Mar 6, 2017 at 8:38 PM, Neil Conway <neil.con...@gmail.com> wrote: > >> I always found that TODO confusing. If a `TimeInfo` is intended to >&g

Re: Time Zone information in TimeInfo

2017-03-06 Thread Neil Conway
I always found that TODO confusing. If a `TimeInfo` is intended to represent the amount of time that has elapsed since the (Unix) epoch, I would expect it to be timezone independent. Can you clarify why having TZ info would be useful? Neil On Mon, Mar 6, 2017 at 7:51 PM, Zameer Manji

Re: Welcome Kevin Klues as a Mesos Committer and PMC member!

2017-03-01 Thread Neil Conway
Congratulations Kevin! Very well-deserved. Neil On Wed, Mar 1, 2017 at 2:05 PM, Benjamin Mahler wrote: > Hi all, > > Please welcome Kevin Klues as the newest committer and PMC member of the > Apache Mesos project. > > Kevin has been an active contributor in the project for

Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-01 Thread Neil Conway
The perf core dump might be addressed if we backport this change: https://reviews.apache.org/r/56611/ Although my guess is that this isn't a severe problem: for some as-yet-unknown reason, running `perf` on the host segfaulted, which causes the test to fail. Neil On Wed, Mar 1, 2017 at 11:09

Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Neil Conway
On Wed, Feb 15, 2017 at 1:59 PM, Jeff Coffler wrote: > 3. Maintaining the correct includes is nice, but not at the cost of compiler > speed. Personally, I would invert these statements -- but until we know the cost of the redundant includes, probably not

Re: Proposal for Mesos Build Improvements

2017-02-15 Thread Neil Conway
On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler wrote: > For efficiency purposes, if a header file is included by 50% or more of the > source files, it should be included in the precompiled header. If a header is > included in fewer than 50% of the source

Re: Proposal for Mesos Build Improvements

2017-02-14 Thread Neil Conway
I'm curious to hear more about how using PCH compares with making stout a non-header-only library. Is PCH easier to implement, or is it expected to offer a more dramatic improvement in compile times? Would making both changes eventually make sense? Neil On Tue, Feb 14, 2017 at 11:28 AM, Jeff

Re: Tracking deprecated features

2017-02-07 Thread Neil Conway
Strongly agree that this can and should be improved! Two questions/suggestions: (1) Should we use JIRA, the website/docs, or both? If we only use JIRA, it might not be obvious to users that, e.g., the "--roles" master flag is deprecated. An alternative would be a table in the docs, listing (a)

Re: Build failed in Jenkins: Mesos-Buildbot » autotools,gcc,--verbose,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2) #3220

2017-02-06 Thread Neil Conway
I haven't seen this test fail elsewhere, but there's at least one other instance of it failing on ASF CI. Unfortunately I couldn't fetch the logs in either case (any chance we change the ASF Jenkins configuration to keep logs for failing jobs for longer?). I'll keep an eye out to see if I can get

Re: Disallowing pre-1.0 Mesos agents

2017-01-23 Thread Neil Conway
t; >> +1 >> >> Technically 0.28.0 was only supposed to be compatible with 0.27.0 and 1.0. >> >> >> On Fri, Jan 20, 2017 at 8:02 PM, Zameer Manji <zma...@apache.org> wrote: >> >> > +1 >> > >> > >> > >> > On Fri

Re: Welcome Neil Conway as Mesos Committer and PMC member!

2017-01-22 Thread Neil Conway
017 at 11:03 PM, Vinod Kone <vinodk...@apache.org> wrote: > >> Hi folks, >> >> Please welcome Neil Conway as the newest committer and PMC member of the >> Apache Mesos project. >> >> Neil has been an active contributor to Mesos for more than a year now. As

Disallowing pre-1.0 Mesos agents

2017-01-20 Thread Neil Conway
I'd like to propose that the Mesos 1.3.0 should not allow pre-1.0 Mesos agents to register. Motivation: (1) We can simplify the master code in a few places. For example, we can assume that we always have a FrameworkInfo for any task running on a registered agent. Needing to handle running tasks

Re: Map support in proto2

2016-12-18 Thread Neil Conway
I believe `oneof` is supported in protobuf 2.6.1 [1], so we wouldn't need to upgrade to make use of it. But I agree that upgrading to protobuf 3 (while continuing to use the proto2 language version) is worth doing at some point. Neil [1]

Re: Building on OS X 10.12

2016-12-12 Thread Neil Conway
I think we should look into adopting "-fvisibility=hidden" and explicitly annotating the symbols that we want to export: https://issues.apache.org/jira/browse/MESOS-6734 Although I agree this isn't a trivial change and it would be good to have some tool support here, but there are lots of

Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
On Mon, Dec 12, 2016 at 1:32 PM, Joris Van Remoortere wrote: > It sounds like using a multi_hashmap for now allows you to clean up the > code and avoid some bugs, without changing the existing behavior. Because we want cache-like behavior (bounded size + LRU replacement),

Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
erent reason (e.g. performance) for using a hashmap? > > I'm wondering why a multi-hashmap is not sufficient. This would be clear if > you were explicitly *trying* to get rid of duplicates of course :-) > > Thanks, > Joris > > — > *Joris Van Remoortere* > Mesosphere > >

Duplicate task IDs

2016-12-09 Thread Neil Conway
Folks, The master stores a cache of metadata about recently completed tasks; for example, this information can be accessed via the "/tasks" HTTP endpoint or the "GET_TASKS" call in the new Operator API. The master currently stores this metadata using a list; this means that duplicate task IDs

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2) #2933

2016-11-16 Thread Neil Conway
Has there been any response from the ASF Infra folks on addressing the VM/hardware issues? Seems like it will be difficult to get good signal from the ASF CI in the absence of some improvements on the infrastructure side. Neil On Wed, Nov 16, 2016 at 10:45 AM, Alex R wrote: >

Re: Build failed in Jenkins: Mesos » autotools,gcc,--verbose --enable-libevent --enable-ssl,GLOG_v=1 MESOS_VERBOSE=1,centos:7,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-6) #2852

2016-10-31 Thread Neil Conway
I spent a little while looking into this. The "PersistentVolumeEndpointsTest.OfferCreateThenEndpointRemove" test fails on the following expectations: https://github.com/apache/mesos/blob/1e57459b7d3f571bdf18fec29b070e78ce719319/src/tests/persistent_volume_endpoints_tests.cpp#L1562

Re: mesos git commit: Added MESOS-6497 to CHANGELOG.

2016-10-28 Thread Neil Conway
This commit should also appear in the master branch, not just 1.1.x Neil On Fri, Oct 28, 2016 at 4:06 PM, wrote: > Repository: mesos > Updated Branches: > refs/heads/1.1.x bc7ecb8cf -> 7fce1b33f > > > Added MESOS-6497 to CHANGELOG. > > > Project:

Re: Non-checkpointing frameworks

2016-10-18 Thread Neil Conway
Hi folks, Thanks for the feedback! On Mon, Oct 17, 2016 at 12:44 PM, Zhitao Li wrote: > +1 to both A to B. > > Do we plan to eventually drop non-checkpionted framework support (possibly > in v2) and declare that all frameworks has to operate in this assumption? I think

Non-checkpointing frameworks

2016-10-14 Thread Neil Conway
Hi folks, I'd like input from individuals who currently use frameworks but do not enable checkpointing. Background: "checkpointing" is a parameter that can be enabled in FrameworkInfo; if enabled, the agent will write the framework pid, executor PIDs, and status updates to disk for any tasks

Re: mesos git commit: Added `DEFAULT_ROLE` constant to persistent volume tests.

2016-09-22 Thread Neil Conway
I'm not sure this is a good idea: the "default role" is actually "*". That is also the default value for the "role" fields in the protobufs. Perhaps we should name this new constant something like DEFAULT_TEST_ROLE? I wonder also if we should keep the definition local to

Fwd: mesos git commit: Fixed a bug in getRootContainerId due to protobuf copying issue.

2016-09-19 Thread Neil Conway
Hi Jie, Do you have more details on what exactly the problem is here? If protobuf is unable to copy/merge nested messages in general, that seems like something that might crop up elsewhere. Perhaps we can (a) file a JIRA (ideally with a self-contained test-case), and/or (c) report the problem to

Re: Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-30 Thread Neil Conway
Hi Ben, Thanks for the feedback! Seems like we're on the same page overall. On Thu, Jul 28, 2016 at 8:42 AM, Benjamin Mahler wrote: > It seems to me that these particular flags are not applicable for > PARTITION_AWARE frameworks, since there is no removal occurring. FWIW,

Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-27 Thread Neil Conway
Hi folks, There are two "safety limits" in place that control the master's agent removal behavior: (1) "--agent_removal_rate_limit" controls the rate at which agents can be removed from the cluster when they fail health checks. (2) "--recovery_agent_removal_limit" controls the fraction of

Re: Registering and framework failover

2016-07-13 Thread Neil Conway
as, is it possible to do this? > > (also, we actually use a failover timeout of 1 week, but it doesn't > really change the problem and I mistakenly assumed that an example with > smaller values would be more intuitive) > > On 13.07.2016 14:50, Neil Conway wrote: >> On Wed

Re: Registering and framework failover

2016-07-13 Thread Neil Conway
On Wed, Jul 13, 2016 at 2:44 PM, Evers Benno wrote: > imagine the following situation: I am a framework with failover timeout > of 1 hour, and 59 minutes and 55 seconds after shutting down I want to > register with the master again. > > If my registration attempt arrives at

Re: Disabling the --registry_strict flag in 1.0

2016-07-13 Thread Neil Conway
Hi Jie, On Wed, Jul 13, 2016 at 12:11 AM, Jie Yu wrote: > Does this mean that we'll have to cut another 1.0 RC just for that? I'd think so. > If we were to cut another RC (e.g., due to bugs, which is likely), I would > be happy to include the patch that disables the flag

Re: getting added to contributors

2016-07-13 Thread Neil Conway
rs group in JIRA (no changes here), or they can > instead submit a PR to contributors.yaml file (just specifying the email > and JIRA handle should be sufficient) which will result in the same thing. > > We will update contribution guidelines to make this explict. > > Artem. > &g

Disabling the --registry_strict flag in 1.0

2016-07-12 Thread Neil Conway
Hi folks, I'd like to propose that we disable the --registry_strict flag for 1.0. You can find the rationale for this change here: https://issues.apache.org/jira/browse/MESOS-5833 Please let me know if you have any thoughts on whether we should make this change. Thanks, Neil

Re: getting added to contributors

2016-07-12 Thread Neil Conway
Do we really want everyone who wants to be assigned a JIRA to also add themselves to the YAML file? To me, this adds another step to a contribution process that probably has too many steps already. Neil On Mon, Jul 11, 2016 at 7:31 PM, Vinod Kone wrote: > Welcome to the

RFC: partitioned tasks and the strict registry

2016-07-11 Thread Neil Conway
Folks, We're working on some Mesos features that will allow frameworks to control how partitioned tasks are handled [1]. As part of designing how this will work, I'd love to hear from users and framework developers about they handle partitioned tasks/agents. Specifically: (a) Have you enabled

Re: [3/4] mesos git commit: Added filtering for orphaned tasks in /state endpoint.

2016-07-06 Thread Neil Conway
On Wed, Jul 6, 2016 at 12:06 AM, wrote: > diff --git a/src/master/http.cpp b/src/master/http.cpp > index 6b4f85b..debedd4 100644 > --- a/src/master/http.cpp > +++ b/src/master/http.cpp > @@ -2498,11 +2498,8 @@ Future Master::Http::state( > }); > > //

Re: Overloading and function names

2016-07-04 Thread Neil Conway
On Sun, Jul 3, 2016 at 9:10 PM, Benjamin Mahler wrote: > To clarify, are you ok with the removeSlave example? It seems to fit your > criteria. I think `removeSlave` is poorly named, for similar reasons -- I just talked about `update` in my email for brevity. > Usually with

Overloading and function names

2016-07-01 Thread Neil Conway
Consider the following function signatures from master.cpp: Nothing Master::removeSlave(const Registry::Slave& slave); void Master::removeSlave(Slave* slave, const string& message, Option reason); or these from sorter/drf/sorter.hpp: void update(const SlaveID& slaveId, const Resources&

Re: source code compile failure mesos-0.28.0

2016-06-21 Thread Neil Conway
Can you post the content of "config.log"? Thanks, Neil On Tue, Jun 21, 2016 at 3:17 PM, Ali Aktar wrote: > Hi; > > All dependencies as per doc were installed. I’m using Centos 7: > Linux ip-172-31-46-249.eu-west-1.compute.internal 3.10.0-327.10.1.el7.x86_64 > #1 SMP

Improving support for partitioned tasks

2016-06-20 Thread Neil Conway
Currently, Mesos implements a hardcoded policy for handling partitioned agents and tasks: * agents are deemed to be partitioned when they fail health checks (~75 seconds by default) * partitioned agents are removed from the cluster. Frameworks receive TASK_LOST for all tasks running on the

Re: Master configuration in the registry

2016-06-10 Thread Neil Conway
Makes sense: arguably you could say that "quota" and "weights" are part of the master's (mutable) "state", not its "configuration", which is largely immutable. Another distinction is that some configuration flags control behavior that doesn't need to be consistent between master replicas (e.g.,

Blog posts for 0.28.1, 0.28.2 releases?

2016-06-10 Thread Neil Conway
Folks, It seems like https://mesos.apache.org/blog/ doesn't have blog posts for the Mesos 0.28.1 or 0.28.2 releases. We generally try to have a blog post for each release, right? Neil

Re: WebUI authentication in 1.0.0-rc1

2016-06-08 Thread Neil Conway
On Wed, Jun 8, 2016 at 4:27 PM, Alexander Rojas wrote: > I think we should also think more thoroughly about the expected behaviour > when we introduce new authorizable actions (and we most certainly will). > Since things may break particularly if users set the

Re: Does anyone know the MESOS-4675 is back-ported to 0.25?

2016-06-08 Thread Neil Conway
Done -- https://issues.apache.org/jira/browse/MESOS-5569 On Wed, Jun 8, 2016 at 2:43 PM, Vinod Kone <vinodk...@gmail.com> wrote: > +1. Cab you file a jira? > > @vinodkone > >> On Jun 8, 2016, at 7:58 AM, Neil Conway <neil.con...@gmail.com> wrote: >> >>

Re: Does anyone know the MESOS-4675 is back-ported to 0.25?

2016-06-08 Thread Neil Conway
It would be great to make this information more prominent on the website, especially once 1.0.0 is released. For example, we could list the supported releases on https://mesos.apache.org/downloads/, along with a link to the versioning document. Neil On Tue, Jun 7, 2016 at 6:58 PM, Vinod Kone

Re: [1/2] mesos git commit: Added aufs provisioning backend.

2016-06-08 Thread Neil Conway
Can you update the documentation for this change, please? Thanks, Neil On Tue, Jun 7, 2016 at 6:14 PM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 90871a48f -> e5358ed1c > > > Added aufs provisioning backend. > > Review:

Re: mesos git commit: Added documentation for access_sandboxes and access_mesos_logs acls.

2016-06-06 Thread Neil Conway
FYI, this commit should have included the changes produced by re-running the `generate-endpoint.py` script. Neil On Wed, Jun 1, 2016 at 8:26 AM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 5263a6211 -> 53b5164bb > > > Added documentation for

Re: mesos git commit: Removed deprecated annotation for values in a protobuf enum.

2016-05-19 Thread Neil Conway
Do we need to be source-compatible with protobuf 2.5? If so, why? Neil On Wed, May 18, 2016 at 11:15 PM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master b7e50fe8b -> 4248b3c3a > > > Removed deprecated annotation for values in a protobuf enum. > > Support

Re: mesos git commit: Updated quota endpoint help.

2016-05-18 Thread Neil Conway
When modifying the endpoint help text, we should remember to update the generated help files (via support/generate-endpoint-help.py) -- the changes to both the input text and generated output files should be included as part of the same commit. Neil On Wed, May 18, 2016 at 10:58 AM,

Re: mesos website workgroup

2016-05-17 Thread Neil Conway
Count me in. Thanks, Neil On Tue, May 17, 2016 at 7:54 AM, Tomek Janiszewski wrote: > Count me in. > > Tomek > > wt., 17.05.2016, 07:49 użytkownik Abhishek Dasgupta < > a10gu...@linux.vnet.ibm.com> napisał: > >> I would be very much interested. I have some front-end

Re: Design doc for TASK_GONE

2016-05-12 Thread Neil Conway
/document/d/1D2mJnwuC1qlT_SJGspfj4MdAQXflESCqKANY0Pj4644 Neil On Mon, May 9, 2016 at 2:37 PM, Neil Conway <neil.con...@gmail.com> wrote: > Hi folks, > > To address some shortcomings and ambiguities in the TASK_LOST task > state, I'd like to propose that we introduce a new task

Re: mesos git commit: Fixed a head-of-line blocking bug in libevent SSL socket.

2016-05-12 Thread Neil Conway
Would it be possible to write a unit test that reproduces the original problem? It should be pretty easy to repro, right? Neil On Thu, May 12, 2016 at 1:50 AM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 95e670cd4 -> 28c085fca > > > Fixed a

Re: mesos git commit: Replaced CHECK with CHECK_READY.

2016-05-10 Thread Neil Conway
t;> Also removes some unused header includes. >>> >>> Review: https://reviews.apache.org/r/46827/ >>> >>> >>> Project: http://git-wip-us.apache.org/repos/asf/mesos/repo >>> Commit: http://git-wip-u

Design doc for TASK_GONE

2016-05-09 Thread Neil Conway
Hi folks, To address some shortcomings and ambiguities in the TASK_LOST task state, I'd like to propose that we introduce a new task state, TASK_GONE. For more information, see the design doc: https://issues.apache.org/jira/browse/MESOS-5345 Comments welcome! Neil

Re: Design doc: ordered delivery in libprocess

2016-04-11 Thread Neil Conway
ections from a given libprocess instance is the "newest". Neil On Fri, Mar 25, 2016 at 12:50 PM, Neil Conway <neil.con...@gmail.com> wrote: > A few months ago, there was a dev list thread on whether libprocess > should provide ordered delivery [1]. The consensus then was that

Re: [1/6] mesos git commit: Fixed a memory leak in the scheduler driver.

2016-03-30 Thread Neil Conway
On Wed, Mar 30, 2016 at 4:57 PM, Benjamin Mahler wrote: > Yikes! (3) being not true to me means that I needed non-local reasoning to > determine the optionality. Sorry: to clarify, I didn't mean "there is not always a latch" in the code in question. I meant: "writing 'delete

Re: [1/6] mesos git commit: Fixed a memory leak in the scheduler driver.

2016-03-30 Thread Neil Conway
t;> Refer to this doc for the detail of deleting null: >> http://www.cplusplus.com/reference/new/operator%20delete/ < >> http://www.cplusplus.com/reference/new/operator%20delete/> >> >> Thanks >> Klaus >> >> > On Mar 30, 2016, at 07:24

Re: [1/6] mesos git commit: Fixed a memory leak in the scheduler driver.

2016-03-29 Thread Neil Conway
On Tue, Mar 29, 2016 at 7:19 PM, wrote: > --- a/src/sched/sched.cpp > +++ b/src/sched/sched.cpp > @@ -1808,6 +1808,10 @@ MesosSchedulerDriver::~MesosSchedulerDriver() > delete process; >} > > + if (credential != NULL) { > +delete credential; > + } `delete`

Design doc: ordered delivery in libprocess

2016-03-25 Thread Neil Conway
A few months ago, there was a dev list thread on whether libprocess should provide ordered delivery [1]. The consensus then was that libprocess doesn't provide ordered delivery in a few corner cases, but that we should fix that behavior to guarantee ordered (but unreliable) message delivery.

Re: Looking for Shepherd for MESOS-5002

2016-03-22 Thread Neil Conway
Sure, I'd be happy to review the change. Neil On Tue, Mar 22, 2016 at 9:01 AM, Jie Yu wrote: > + Neil > > Neil is driving the documentation improvement in Mesos. Neil, do you have > time for that? I can help commit the patch if you give a shipit. > > - Jie > > On Tue, Mar

Re: mesos git commit: Add 'name' field into NetworkInfo.

2016-03-10 Thread Neil Conway
Should we also update docs/networking-for-mesos-managed-containers.md? It contains a version of the NetworkInfo message definition. Neil On Thu, Mar 10, 2016 at 11:05 AM, wrote: > Repository: mesos > Updated Branches: > refs/heads/master 57a574fc9 -> 2a436e02f > > > Add

Re: Need CHANGELOG updates

2016-03-03 Thread Neil Conway
I sent https://reviews.apache.org/r/44348/ for the floating point math changes; if you'd prefer a different format or more/less details, just let me know. Thanks, Neil On Thu, Mar 3, 2016 at 10:57 AM, Vinod Kone wrote: > Hi guys, > > The 0.28.0 release is currently blocked

Re: Making 'curl' a prerequisite for installing Mesos

2016-03-03 Thread Neil Conway
No objection to about the additional dependency, but using 'curl' instead of 'libcurl' seems unfortunate. Can you share some more detailed information about the problems that have been encountered using libcurl? e.g., was using the curl_multi_xxx() APIs explored? Neil On Thu, Mar 3, 2016 at 9:10

Re: Discussion about upgrading 3rdparty libraries

2016-03-01 Thread Neil Conway
The prospect of downloading dependencies from "rando" locations is concerning to me :) Mesos can easily come to depend on implementation details of a dependency that might change in a minor release. For example, a recent change [1] depends on the connection retry logic in the Zk client library in

Re: [VOTE] Release Apache Mesos 0.27.2 (rc1)

2016-02-29 Thread Neil Conway
As described (briefly) in the release emails, 0.27.2, 0.26.1, 0.25.1, and 0.24.2 contains a new feature: "reliable floating point for scalar resources" (MESOS-4687). To elaborate on that slightly, Mesos now only supports scalar resource values with three decimal digits of precision (e.g.,

Re: Enable compiler optimization by default?

2016-02-18 Thread Neil Conway
t;> >> So our CI will also update to use optimisation flags, right? We need to >> highlight this in upgrade document to our user; I used to meet so strange >> behaviour after changing -O level. >> >> On Thu, Feb 18, 2016 at 8:51 AM James DeFelice <james.defel...@gmail

Re: Enable compiler optimization by default?

2016-02-17 Thread Neil Conway
On Wed, Feb 17, 2016 at 5:07 PM, Zameer Manji wrote: > Can't this problem also be solved by distributing packages that have > optimized binaries? The individuals/organizations that build packaged versions of Mesos should ensure that compiler optimizations are enabled -- but I

Enable compiler optimization by default?

2016-02-17 Thread Neil Conway
Hi folks, At present, Mesos defaults to compiling with "-O0"; to enable compiler optimizations, the user needs to specify "--enable-optimize". I'd like to propose we change the default, for a few reasons: (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally, I think most

Re: [2/2] mesos git commit: Added documentation for labeled reserved resources.

2016-02-12 Thread Neil Conway
Hi Ben, On Fri, Feb 12, 2016 at 2:34 AM, Benjamin Mahler wrote: > Any plans to support labels for static reservations? > > Are we intentionally not supporting ReservationInfo for static > reservations? Or is this just outside of the initial scope? Labels for static

Re: Shepherd for MESOS-3486

2016-02-12 Thread Neil Conway
Hi Michael, Thanks for taking this on! Joris Van Remoortere will shepherd, but please also include me in the review request. Thanks, Neil On Thu, Feb 11, 2016 at 5:21 PM, Michael Browning wrote: > Hello, > > I've picked a small issue off of the newbie issues stack to

Precision of scalar resources

2016-02-12 Thread Neil Conway
tl;dr: If you use resource values with more than three decimal digits of precision (e.g., you are launching a task that uses 2.5001 CPUs), please speak up! Mesos uses floating point to represent scalar resource values, such as the number of CPUs in a resource offer or dynamic reservation.

Re: Questions about release process

2016-02-04 Thread Neil Conway
freebsd.hpp is missing from the 0.27 release tarball, presumably because 3rdparty/libprocess/3rdparty/stout/include/Makefile.am was not updated to account for it. I'll send an RR shortly. Neil On Thu, Feb 4, 2016 at 10:16 AM, haosdent wrote: > Hi, David. I could saw you

Re: Version numbers in docs

2016-02-02 Thread Neil Conway
I agree we should remove this text after some period of time has passed since that version of Mesos was released; it is quite distracting. The proper long-term fix is probably to have version-specific docs. So all the documentation at /documentation/v0.27/... would implicitly discuss the

Re: Follow up on the proposal for simulation tools for master and allocator

2016-01-21 Thread Neil Conway
Hi Zhitao, There's a JIRA here: https://issues.apache.org/jira/browse/MESOS-3855 A few people who are interested in simulation of Mesos have been meeting periodically, although due to the holidays we haven't had a meeting in a little bit. I'll make sure you're included in the next meeting when

Re: mesos git commit: Added recommendations for programming with persistent volumes.

2016-01-19 Thread Neil Conway
os/asf/mesos/commit/e2963966 >> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/e2963966 >> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/e2963966 >> >> Branch: refs/heads/master >> Commit: e2963966acc5c2263849ef183c9ee57251102d0e >> Paren

Re: Links in documentation

2016-01-14 Thread Neil Conway
On Thu, Jan 14, 2016 at 11:39 AM, Joris Van Remoortere wrote: >> *In fact it seems that all links ending with .md are interpreted as >> relative links on the webpage, i.e. [label](https://test.com/foo.md) is >> rendered into https://test.com/foo/ >> ">label. > > I think this

Re: [MESOS-1865] Redirect to the leader master when current master is not a leader.

2016-01-08 Thread Neil Conway
On Fri, Jan 8, 2016 at 12:29 PM, Benjamin Mahler wrote: > (2) It is difficult to reliably obtain cluster state through the existing > endpoints. This one is less clear to me than the first problem. Here we > have to think through how we want users to be hitting state

Re: [MESOS-1865] Redirect to the leader master when current master is not a leader.

2016-01-06 Thread Neil Conway
+1 -- I think we should make this change. The current behavior is quite dangerous. Neil On Wed, Jan 6, 2016 at 12:52 PM, Diogo Gomes wrote: > Hi, Adam and Haosdent > > > Resurrecting this issue, https://issues.apache.org/jira/browse/MESOS-1865, I > would like to make a +1

Re: No master is currently leading ...

2016-01-06 Thread Neil Conway
Hi, Can you post the full logs from all of the master instances? BTW, the @dev list is mostly intended for discussion around the development of Mesos. The @user list is a better venue for user support/configuration questions. Thanks, Neil On Wed, Jan 6, 2016 at 12:58 PM, DiGiorgio, Mr.

Re: Mesos build & testing environment instructions

2015-12-17 Thread Neil Conway
+1 to the general idea of including this information in the documentation. I'd probably lean towards including this information in the current "Getting Started" page, but in a separate section ("Running The Test Suite"?). Neil On Thu, Dec 17, 2015 at 12:38 PM, Greg Mann

Re: Speed up Mesos tests

2015-12-16 Thread Neil Conway
+1 on the speed-up-the-tests project! On Wed, Dec 16, 2015 at 10:29 AM, Greg Mann wrote: > I'd like to bring up something that both Neil and Joseph mentioned to me > recently, which could be of use when working on these slow test tickets. > Since we have the `process::Clock`

  1   2   >