Design doc: Agent draining and deprecation of maintenance primitives

2019-05-29 Thread Joseph Wu
Hi all,

A few years back, we added some constructs called maintenance primitives to
Mesos.  This feature was meant to allow operators and frameworks to
cooperate in draining tasks off nodes scheduled for maintenance.  As far as
we've observed since, this feature never achieved enough adoption to be
useful for operators.

As such, we are proposing a more opinionated approach for draining tasks.
The goal is to have Mesos perform draining in lieu of frameworks,
minimizing or eliminating the need to change frameworks to account for
draining.  We will also be simplifying the operator workflow, which would
only require a single call (holding an AgentID) to start draining; and a
single call to bring an agent back into the cluster.

Due to how closely this proposed feature overlaps with maintenance
primitives, we will be deprecating maintenance primitives upon
implementation of agent draining.

If interested, please take a look at the design document:
https://docs.google.com/document/d/1w3O80NFE6m52XNMv7EdXSO-1NebEs8opA8VZPG1tW0Y/


Re: [VOTE] Release Apache Mesos 1.8.0 (rc2)

2019-04-23 Thread Joseph Wu
-1 (binding)

We found a serious bug when upgrading from 1.7.x to 1.8.x, which prevents
agents from reregistering after upgrading the masters:
https://issues.apache.org/jira/browse/MESOS-9740

On Tue, Apr 23, 2019 at 8:27 AM Andrei Budnik  wrote:

> +1
>
> sudo make -j16 distcheck
> DISTCHECK_CONFIGURE_FLAGS='--disable-libtool-wrappers
> --disable-parallel-test-execution --enable-seccomp-isolator
> --enable-launcher-sealing'
> on Fedora 25
>
> I gave +1, but some of the recently added tests are failing:
> [  FAILED  ] VolumeGidManagerTest.ROOT_UNPRIVILEGED_USER_SlaveReboot
> [  FAILED  ] CniIsolatorTest.VETH_VerifyResourceStatistics
> [  FAILED  ] DockerVolumeIsolatorTest.ROOT_EmptyCheckpointFileSlaveRecovery
>
>
> On Thu, Apr 18, 2019 at 3:00 PM Benno Evers  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 1.8.0.
> >
> >
> > 1.8.0 includes the following:
> >
> >
> 
> >  * Greatly reduced allocator cycle time.
> >  * Operation feedback for v1 schedulers.
> >  * Per-framework minimum allocatable resources.
> >  * New CLI subcommands `task attach` and `task exec`.
> >  * New `linux/seccomp` isolator.
> >  * Support for Docker v2 Schema2 manifest format.
> >  * XFS quota for persistent volumes.
> >  * **Experimental** Support for the new CSI v1 API.
> >
> > In addition, 1.8.0-rc2 includes the following changes:
> >
> >
> -
> >  * Docker manifest v2s2 config with image GC.
> >  * Expanded `highlights` section in the CHANGELOG.
> >
> >
> > The CHANGELOG for the release is available at:
> >
> >
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.8.0-rc2
> >
> >
> 
> >
> > The candidate for Mesos 1.8.0 release is available at:
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.8.0-rc2/mesos-1.8.0.tar.gz
> >
> > The tag to be voted on is 1.8.0-rc2:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.8.0-rc2
> >
> > The SHA512 checksum of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.8.0-rc2/mesos-1.8.0.tar.gz.sha512
> >
> > The signature of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.8.0-rc2/mesos-1.8.0.tar.gz.asc
> >
> > The PGP key used to sign the release is here:
> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >
> > The JAR is in a staging repository here:
> > https://repository.apache.org/content/repositories/orgapachemesos-1252
> >
> > Please vote on releasing this package as Apache Mesos 1.8.0!
> >
> > The vote is open until Wednesday, April 24th and passes if a majority of
> > at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Mesos 1.8.0
> > [ ] -1 Do not release this package because ...
> >
> > Thanks,
> > Benno and Joseph
> >
>


Re: full Zookeeper authentication

2019-01-08 Thread Joseph Wu
The Apache mailing lists typically do not allow attachments (for various
reasons).  Go ahead and open up a Github PR, and we can continue from there.

On Tue, Jan 8, 2019 at 9:41 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> Here is the patch. It is quite small. Should I do make a github pull
> request or it is already "advance contribution"?
>
> --
>
> Dmitrii Kishchukov.
> Leading software developer
> Submission Portal Team
>
>
> On 1/7/19, 5:10 PM, "Joseph Wu"  wrote:
>
> I would be happy to shepherd (now that I'm back from winter holidays).
>
> On Mon, Dec 24, 2018 at 6:15 AM Alex Rukletsov 
> wrote:
>
> > Made you a contributor and assigned the issue to you. Thanks!
> >
> > Joseph, will you shepherd this?
> >
> > On Fri, Dec 21, 2018 at 4:32 PM Kishchukov, Dmitrii (NIH/NLM/NCBI)
> [C] <
> > dmitrii.kishchu...@nih.gov> wrote:
> >
> >> I created a JIRA account. Username is dkishchukov
> >>
> >> --
> >>
> >> Dmitrii Kishchukov.
> >> Leading software developer
> >> Submission Portal Team
> >>
> >>
> >> On 12/21/18, 4:02 AM, "Alex Rukletsov"  wrote:
> >>
> >> Dmitrii—
> >>
> >> here we go: MESOS-9499 [1]. I've noticed you don't have an
> Apache JIRA
> >> account, I'd suggest you create one so that you can assign the
> ticket
> >> to
> >> you and hence get credit properly. Hope it is not your last
> >> contribution to
> >> Apache projects : ).
> >>
> >> [1] https://issues.apache.org/jira/browse/MESOS-9499
> >>
> >>
>
>
>


Re: full Zookeeper authentication

2019-01-07 Thread Joseph Wu
I would be happy to shepherd (now that I'm back from winter holidays).

On Mon, Dec 24, 2018 at 6:15 AM Alex Rukletsov  wrote:

> Made you a contributor and assigned the issue to you. Thanks!
>
> Joseph, will you shepherd this?
>
> On Fri, Dec 21, 2018 at 4:32 PM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
> dmitrii.kishchu...@nih.gov> wrote:
>
>> I created a JIRA account. Username is dkishchukov
>>
>> --
>>
>> Dmitrii Kishchukov.
>> Leading software developer
>> Submission Portal Team
>>
>>
>> On 12/21/18, 4:02 AM, "Alex Rukletsov"  wrote:
>>
>> Dmitrii—
>>
>> here we go: MESOS-9499 [1]. I've noticed you don't have an Apache JIRA
>> account, I'd suggest you create one so that you can assign the ticket
>> to
>> you and hence get credit properly. Hope it is not your last
>> contribution to
>> Apache projects : ).
>>
>> [1] https://issues.apache.org/jira/browse/MESOS-9499
>>
>>


Re: full Zookeeper authentication

2018-12-10 Thread Joseph Wu
There are two options for contributing:
1) You can make a pull request against the GitHub mirror:
https://github.com/apache/mesos .  We generally only use PRs for minor
changes, like typos, documentation, or uploading binaries.  See
http://mesos.apache.org/documentation/latest/beginner-contribution/
2) For larger changes, or more involved/impactful changes, we prefer
https://reviews.apache.org/ instead.  See
http://mesos.apache.org/documentation/latest/advanced-contribution/

I suspect this ZK Auth feature will be a fairly significant change, so I
recommend option (2).

On Mon, Dec 10, 2018 at 11:47 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> I have a working version. How should I make the patch? A branch in the git
> repository? Do I need to get permissions?
>
> --
>
> Dmitrii Kishchukov.
> Leading software developer
> Submission Portal Team
>
>
> On 12/6/18, 12:56 PM, "Vinod Kone"  wrote:
>
> Dmitrii.
>
> That approach sounds reasonable. Would you like to work on this? Are
> you
> looking for a reviewer/shepherd?
>
> On Thu, Dec 6, 2018 at 11:28 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C]
> <
> dmitrii.kishchu...@nih.gov> wrote:
>
> > Mesos allow using only digest authentication scheme for Zookeeper.
> Which
> > is bad because Zookeeper has quite a flexible security model.
> > It is easy to make you own authenticator with its own scheme name.
> >
> > To support fully Zookeeper authentication, Mesos has pass two items
> into
> > Zookeeper:
> > scheme and credentials.
> > credentials can have different format depending on authentication
> scheme.
> > For digest scheme it is ‘login:password’
> >
> > All Mesos should do just pass scheme and credentials to Zookeeper.
> >
> > Another improvement might be be to configure credentials via file
> instead
> > of URI
> >
> > For example it can be two command line options:
> > --zk_auth_scheme and –zk_auth_credentials
> >
> > It can be used like this:
> > --zk_auth_scheme=some_custome_scheme –zk_auth_credentials=filename
> >
> > --zk_auth_credentials can just get all contents of the file as
> credentials
> > string.
> >
> > Class Authentication in Mesos already contains all that we need. The
> > problem is what Mesos pass to the constructor.
> >
> >
> > --
> >
> > Dmitrii Kishchukov.
> >
> >
>
>
>


Re: full Zookeeper authentication

2018-12-07 Thread Joseph Wu
There are currently three components of Mesos that use Zookeeper:

*Master Detector:*
This object is used by the Mesos Master, Agent, and Scheduler to find which
Master is the leader.
The existing detector code will parse a "zk://" URL if given here:
https://github.com/apache/mesos/blob/1.7.x/src/master/detector/detector.cpp#L62

Not including tests, there are four call sites which pass in a ZK URL to
the detector:

   - Master:
   https://github.com/apache/mesos/blob/1.7.x/src/master/main.cpp#L430-L433
   - Agent:
   https://github.com/apache/mesos/blob/1.7.x/src/slave/main.cpp#L487-L490
   - Scheduler:
   https://github.com/apache/mesos/blob/1.7.x/src/sched/sched.cpp#L152
   - (Deprecated) CLI helper binary:
   https://github.com/apache/mesos/blob/1.7.x/src/cli/resolve.cpp#L95-L96

*Master Contender:*
This object is used by the Mesos Master to contend for leadership of the
cluster.
The contender will parse a ZK URL just like the detector:
https://github.com/apache/mesos/blob/1.7.x/src/master/contender/contender.cpp#L53
Unlike the detector, there is only a single call site for the contender:
https://github.com/apache/mesos/blob/1.7.x/src/master/main.cpp#L418-L421

*Replicated Log Library:*
This is a library which is used by the Mesos Master and some custom
frameworks, to persist data via the Paxos algorithm.
The Master's call site is straightforward:
https://github.com/apache/mesos/blob/1.7.x/src/master/main.cpp#L383-L391

The library is built into a JAR for use by java frameworks, so there are
two references in this JNI code:
https://github.com/apache/mesos/blob/1.7.x/src/java/jni/org_apache_mesos_Log.cpp#L673
https://github.com/apache/mesos/blob/1.7.x/src/java/jni/org_apache_mesos_state_LogState.cpp#L75


Some other files that you will likely need to modify include:

   - The zookeeper::Authentication class:
   
https://github.com/apache/mesos/blob/1.7.x/include/mesos/zookeeper/authentication.hpp
   This will need to be extended to allow non-digest schemes.  It will
   currently exit if a non-digest scheme is passed in the URL.
   - The zookeeper::URL class:
   https://github.com/apache/mesos/blob/1.7.x/include/mesos/zookeeper/url.hpp
   Depending on how flexible the authentication schemes are, you may need
   to update the URL parsing logic, or scrap the URL altogether if there are
   authentication schemes that cannot be encoded in a URL.
   - The "--zk" flag for the Master:
   https://github.com/apache/mesos/blob/1.7.x/src/master/flags.cpp#L666-L673
   You may need to update the documentation of this flag, or perhaps add
   new flags.
   - The "--master" flag for the Agent:
   https://github.com/apache/mesos/blob/1.7.x/src/slave/flags.cpp#L1421-L1427
   This will look similar to the "--zk" Master flag, but it also supports
   non-ZK masters.


Hopefully this list of code locations will give you some idea of where to
start.  Feel free to ping us in Slack too.

On Fri, Dec 7, 2018 at 6:01 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> Yes. I want to do it. And it would be good if someone could give an advise
> how to do it. For example is there one place where Authentication object
> constructed for Zookeeper?
> For me it looks like there many places which is strange.
>
> --
>
> Dmitrii Kishchukov.
> Leading software developer
> Submission Portal Team
>
>
> On 12/6/18, 12:56 PM, "Vinod Kone"  wrote:
>
> Dmitrii.
>
> That approach sounds reasonable. Would you like to work on this? Are
> you
> looking for a reviewer/shepherd?
>
> On Thu, Dec 6, 2018 at 11:28 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C]
> <
> dmitrii.kishchu...@nih.gov> wrote:
>
> > Mesos allow using only digest authentication scheme for Zookeeper.
> Which
> > is bad because Zookeeper has quite a flexible security model.
> > It is easy to make you own authenticator with its own scheme name.
> >
> > To support fully Zookeeper authentication, Mesos has pass two items
> into
> > Zookeeper:
> > scheme and credentials.
> > credentials can have different format depending on authentication
> scheme.
> > For digest scheme it is ‘login:password’
> >
> > All Mesos should do just pass scheme and credentials to Zookeeper.
> >
> > Another improvement might be be to configure credentials via file
> instead
> > of URI
> >
> > For example it can be two command line options:
> > --zk_auth_scheme and –zk_auth_credentials
> >
> > It can be used like this:
> > --zk_auth_scheme=some_custome_scheme –zk_auth_credentials=filename
> >
> > --zk_auth_credentials can just get all contents of the file as
> credentials
> > string.
> >
> > Class Authentication in Mesos already contains all that we need. The
> > problem is what Mesos pass to the constructor.
> >
> >
> > --
> >
> > Dmitrii Kishchukov.
> >
> >
>
>
>


[API WG] Proposals for dealing with master subscriber leaks.

2018-11-09 Thread Joseph Wu
Hi all,

During some internal scale testing, we noticed that, when Mesos streaming
endpoints are accessed via certain proxies (or load balancers), the proxies
might not close connections after they are complete.  For the Mesos master,
which only has the /api/v1 SUBSCRIBE streaming endpoint, this can generate
unnecessary authorization requests and affects performance.

We are considering a few potential solutions:

   - We can add heartbeats to the SUBSCRIBE call.  This would need to be
   part of a separate operator Call, because one platform (browsers) that
   might subscribe to the master does not support two-way streaming.
   - We can add (optional) arguments to the SUBSCRIBE call, which tells the
   master to disconnect it after a while.  And the client would have to remake
   the connection every so often.
   - We can change the master to hold subscribers in a circular buffer, and
   disconnect the oldest ones if there are too many connections.

We're tracking progress on this issue here:
https://issues.apache.org/jira/browse/MESOS-9258
Some prototypes of the code changes involved are also linked in the JIRA.

Please chime in if you have any suggestions or if any of these options
would be undesirable/bad,
~Joseph


Re: 122 resources.cpp:1134] Check failed: !resource.has_role() cpus:8

2018-09-26 Thread Joseph Wu
I believe what you are running into is a slight change in how we represent
Resources.  Older frameworks expect unreserved resources to look like this:

> {
>   role: "*",
>   reservation: ,
>   reservations: 
> }


In 1.4.0, we started representing unreserved resources like:

> {
>   role: ,
>   reservation: ,
>   reservations: []
> }


And our Resources.hpp utility files were updated to expect this new
format.  By compiling your framework against a newer libmesos, you were
using a utility expecting the new resource format, but receiving the older
format from the Master.  You'll need to add this line to your FrameworkInfo
to receive the newer format:

framework.add_capabilities()->set_type(FrameworkInfo::Capability::RESERVATION_REFINEMENT);


Here's the JIRA that tracked this change:
https://issues.apache.org/jira/browse/MESOS-7575

On Wed, Sep 26, 2018 at 7:56 AM James Vanns  wrote:

> Hi! It's been a looonng time since I've asked a question on this list
> (several years) so excuse me if this is now the wrong forum! Anyway,
> basically, I've got an old Mesos framework I'm resurrecting and it was
> developed against 0.26.x, I think. For the sheer Hell of it I just upgraded
> Mesos to 1.5.0 and by pure miracle (or rather, excellent API/ABI work on
> your part!) I only had to change about 1 or 2 lines of my C++ code to get a
> build :) However, when I run it and push a task to it, it now bails with
> this stack trace;
>
> F0926 14:35:34.225720   122 resources.cpp:1134] Check failed:
> !resource.has_role() cpus:8
> *** Check failure stack trace: ***
> @ 0x7fae14997a7d  google::LogMessage::Fail()
> @ 0x7fae14999830  google::LogMessage::SendToLog()
> @ 0x7fae14997663  google::LogMessage::Flush()
> @ 0x7fae1499a259  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7fae1393d8a3  mesos::Resources::isEmpty()
> @ 0x7fae1393d97c  mesos::Resources::add()
> @ 0x7fae1393fce0  mesos::Resources::operator+=()
> @ 0x7fae1393fd8d  mesos::Resources::operator+=()
> @ 0x7fae1394001b  mesos::Resources::Resources()
>
> Which I happen to find is the same path as this random link for
> mesos-executor;
>
>
> https://files.sameroom.io/q9gILEXTOUokGlBYScOrZMVvAFz1g72fK9Ys1oxk0Ho/mesos-execute_.txt
>
> So given that none of my code ever had any explicit roles set and all
> slaves (sorry, agents - it's been that long ;) and the master have only
> ever assumed the default role ('*'), what do I need to add/remove in my
> code to get it to work as it did before!? If it helps it appears that the
> code that generates this dump is;
>
> offer->resources();
>
> I hope this is enough info! If not, please ask and I'll paste more context
> etc.
>
> Cheers,
>
> Jim
>
> --
> Senior Production Engineer,
> Industrial Light & Magic
>


Re: Mesos replicated log dual writes

2018-06-05 Thread Joseph Wu
The two-byte difference is most likely coming from the "learned" field:
https://github.com/apache/mesos/blob/master/src/messages/log.proto#L49

The first time an entry is recorded, the entry is "unlearned", basically
meaning that the entry has not been written to a quorum of masters (yet).
Once a quorum of masters indicate to each other that they have written an
entry, they then flip the bit to "learned".

This process is mandatory for Paxos to work, but I can see how overwriting
an entry could be expensive.  There's no particular restriction on why we
must overwrite the entire entry, except that we currently use the position
number (i.e. 6506690 in your example) as the key in the leveldb
implementation.

On Tue, Jun 5, 2018 at 12:30 AM, meghdoot bhattacharya <
meghdoo...@yahoo.com.invalid> wrote:

> Recently investigation of logs of aurora around snapshot creations in
> replicated log come up with this
> I0601 21:57:27.322444   144 log.cpp:577] Attempting to append 524304
> bytes to the logI0601 21:57:27.322501   144 coordinator.cpp:348]
> Coordinator attempting to write APPEND action at position 6506690I0601
> 21:57:27.323122   139 replica.cpp:539] Replica received write request for
> position 6506690 from __req_res__(3685)@x.x.x.x:I0601 21:57:27.443131
>   139 leveldb.cpp:341] Persisting action (524331 bytes) to
> leveldb took 119.847993msI0601 21:57:27.443208   139 replica.cpp:710]
> Persisted action APPEND at position 6506690I0601 21:57:27.443437   139
> replica.cpp:693] Replica received learned notice for position 6506690 from
> @0.0.0.0:0I0601 21:57:27.478754   139 leveldb.cpp:341] Persisting action
> (524333 bytes) to leveldb took 35.278735msI0601 21:57:27.478818   139
> replica.cpp:710] Persisted action APPEND at position 6506690
>
>
> It seems there are dual writes to leveldb for the same record (with 2 byte
> diff). Becomes expensive if that is the case for large records. Hoping to
> get some insights.
> Thx
>


Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Joseph Wu
Hi devs & users,

I'm happy to announce that Andrew Schwartzmeyer has become a new committer
and member of the PMC for the Apache Mesos project.  Please join me in
congratulating him!

Andrew has been an active contributor to Mesos for about a year.  He has
been the primary contributor behind our efforts to change our default build
system to CMake and to port Mesos onto Windows.

Here is his committer candidate checklist for your perusal:
https://docs.google.com/document/d/1MfJRYbxxoX2-A-
g8NEeryUdUi7FvIoNcdUbDbGguH1c/

Congrats Andy!
~Joseph


Re: DC/OS (Mesos) portability

2017-11-03 Thread Joseph Wu
It isn't clear to me how DC/OS would benefit from (ongoing) work to
create/push Mesos packages.  DC/OS downloads and builds all of its
component parts from source.

Also, we (Mesos devs) are hoping to get more frameworks to move away from
using libmesos (including the API shims), in favor of using the HTTP APIs
instead.  So we have a dis-incentive to provide a libmesos bundle.

On Fri, Nov 3, 2017 at 8:23 AM, Tomas Barton  wrote:

> Hi,
>
> I'd like to contribute to DC/OS with a Debian/Suse/... support.
> Surprisingly on Debian most of the compatibility issues could be solved by
> a sequence of symlinks.
>
> Why Mesos dev list? :)
>
> Currently the biggest issue is connected to distributing libmesos-bundle
> tar archive, which contain the libmesos.so library and several others. The
> library is dynamically linked with certain libcurl,  libssl, libsvn etc.
> that might differ between distributions.
>
> I can think of a few solutions:
>  1. Compile Mesos (master and agent) using static build (which as I
> understood aren't currently fully supported/propagated).
>  2. Generate bundle during automatic builds for certain supported
> distributions.
>  3. Include libmesos in standard distribution channels - rpm, deb packages
> (that might take same time).
>
> The last solution would be the best, but Mesos release cycle is very
> different from distributions release cycle. It might be complicated to
> synchronize.
>
> I coudn't find scripts for generating libmesos-bundle, but it's a archive
> with libraries from build server, e.g.
> https://downloads.mesosphere.io/libmesos-bundle/libmesos-
> bundle-1.10-1.4-63e0814.tar.gz
> (32MB).
>
> So the question is, whether Mesos website could provide prebuild libmesos
> bundle for each release and platform, that could be afterwards used e.g. in
> DC/OS packages?
>
> Last issue might be connected to an executor that eventually might need OS
> family ENV variable with OS release version, so that it can fetch
> corresponding libbundle archive. Such information is typically parsed from
> `uname -a` or `lsb_release -sri` (if available). This way DC/OS could be
> running on a cluster with diverse OS versions/distributions.
>
> Thanks for your time! I'd like to hear your opinion.
>
> Regards,
> Tomas Barton
>


[Design Doc] Standalone Container API

2017-08-07 Thread Joseph Wu
As part of work to improve storage support in Mesos [1], we will be adding
the ability to launch containers via the Mesos Containerizer, without going
through the traditional method (i.e. framework -> offer cycle -> launch
executor/task -> status updates -> etc).  Below I've linked a short design
document for interacting with these "standalone" containers:

https://docs.google.com/document/d/1DZVfZAOLtqd8kbiWHD4j29LzaYcNCh1k6QQnbggyTio/

Please feel free to comment on the doc (or on this thread) if you have any
comments or suggestions! Thanks!

[1]
https://lists.apache.org/thread.html/02871cb51ce6d0bec24770bcaaba07b52dcda0cdb87cbdd0871b82d1@%3Cdev.mesos.apache.org%3E


Re: Custom isolators - External container

2017-08-07 Thread Joseph Wu
First off, the external containerizer was officially removed in Mesos 1.1.0
(it had been deprecated long before that release):
https://issues.apache.org/jira/browse/MESOS-3370

---

If you want to develop/deploy a new isolation method for Mesos, you should
first consider writing isolator modules (Mesos modules):
https://github.com/apache/mesos/blob/master/include/mesos/slave/isolator.hpp

Isolator modules are only applicable for the Mesos containerizer, so if you
plan to run docker workloads, you can consider using built-in isolators
("docker/runtime") that support running docker images in the Mesos
containerizer.

If you plan to use the Docker containerizer, your only choice is to develop
a custom executor to isolate tasks only within the same executor (docker
will take over isolating executors from each other).

---

There are few benefits from running the Mesos agent inside a Docker
container and many pitfalls, so this practice is highly discouraged.
Instead, we recommend running the Mesos agent directly via a supervisor
(upstart, systemd, etc.).  The agent itself is not containerized when run
normally.

On Sun, Aug 6, 2017 at 4:32 PM, Thodoris Zois  wrote:

> Hello,
>
> Is support of external containerizer removed from Mesos? Also, i have
> developed some isolators that i would like to use with Mesos. I found 3
> ways to do that but i don't know what is the proper way and what are the
> advantages and disadvantages in each case.
>
> The 1st one is as a Mesos module
>
> The 2nd one is a custom executor
>
> The 3rd one is the container image on agent.
>
> What i am trying to do is to isolate docker tasks (images - one task per
> docker container) that run under the same agent with my own isolators.
>
> What are the benefits of running agent in a big docker container and
> inside small docker containers as tasks?  If you don't run the agent under
> a big docker container  then by default is running under Mesos container
> while inside are running small docker containers with tasks? (Assume
> that we don't run tasks under mesos container)
>
>
> Thank you and sorry for the so many questions!
> Thodoris
>


Re: The state of cmake

2017-06-21 Thread Joseph Wu
Here's the earlier email which has the feature comparison:

https://lists.apache.org/thread.html/527a29b45c52a042c122c96754804983b1447b7409ffec3d635b7143@%3Cdev.mesos.apache.org%3E

The list is still accurate, except that precompiled headers are no longer
"upcoming".

On Wed, Jun 21, 2017 at 4:42 PM, Jeff Coffler <
jeff.coff...@microsoft.com.invalid> wrote:

> Hi Aaron,
>
> I'd like to expand on what Andy said:
>
> If you want cross-platform development, then cmake is the only way to go.
> For example, if you want to build on Windows, you MUST use cmake. We
> anticipate, over time, that cmake will replace the autotools build (we do
> not want to maintain two build systems). The cmake system is also much more
> expandable (for example, while this hasn't been done on Linux, Windows had
> dramatic speed improvements through the use of precompiled headers - if
> someone was inclined to spend the time on Linux, I imagine similar speed
> improvements are possible). Note, by the way, that ReviewBot runs on
> Windows; if you break the Windows build, you need to fix it prior to
> committing changes.
>
> I would say: If you don't care about Java or Python bindings, and you're
> doing development (i.e. you don't need an installable package), then cmake
> is a fine way to go. But if you need something that only autotools does
> today, then you don't really have a choice. Regardless, when you commit a
> change, you need to be sure that both build systems work properly.
>
> Note that cmake is compatible with ccache. Also, FWIW, cmake also gives
> you very nice "percentage done" notifications on Linux (i.e. 85% done, or
> whatever), which is super nice to know how far along you are. That's a very
> cool feature that I just love.
>
> I agree that we sorely need a concise list of features that are missing.
> We need to understand what's missing, and judge how often missing features
> are used, in order to "fully bake" the cmake build system in Mesos.
>
> /Jeff
>
> -Original Message-
> From: Andy Schwartzmeyer [mailto:andsc...@microsoft.com.INVALID]
> Sent: Wednesday, June 21, 2017 4:12 PM
> To: dev@mesos.apache.org
> Subject: RE: The state of cmake
>
> Hi Aaron,
>
> The biggest difference right now is that the Java and Python bindings are
> not built whatsoever with the CMake build system. We also do not have an
> install target, so the CMake output is kind of stuck in "developer mode"
> and it won't generate an installable package.
>
> I probably would not yet recommend the CMake build system for production
> use.
>
> As far as what features are missing, I'm not aware of a concise list, but
> agree this is needed. Perhaps Joseph knows of one. If one does not exist at
> all, perhaps it's time we audit the issues and do a comparison of the two
> build systems as they stand now to generate this list.
>
> Cheers,
>
> Andy
>
> From: Wood, Aaron
> Sent: Wednesday, June 21, 2017 4:00 PM
> To: dev
> Subject: The state of cmake
>
> Hi all,
>
> I'm curious as to what the current state of came is on Linux. I noticed
> that some features that are present in the autotools build are not yet in
> cmake. Also, the output from a successful cmake build looks a bit different
> as far as the number of libraries that are produced and the number of
> symlinks created.
>
> While the output of a cmake build does seem to work fine on Linux, is
> there anything to be aware of that would cause issues for a production
> release? Is there a list of features somewhere that are in autotools  but
> not yet in cmake? Does anyone think it is an exceptionally bad idea to use
> the current cmake system to produce binaries for production use?
>
> Thanks!
> -Aaron
>
>


Re: Mesos Executor Failing

2017-05-24 Thread Joseph Wu
There isn't a tool for this.  Can you check if the Mesos agent is being
restarted (or crashing) when you launch a task?  And perhaps upload some
logs around the time of the task launch.

There is a mismatch between the exit codes you've reported though.  When
you see that log line in the sandbox logs, the exit code will be "1"
(failure), rather than "0" (success).

On Mon, May 22, 2017 at 9:30 PM, Chawla,Sumit <sumitkcha...@gmail.com>
wrote:

> Hi Joseph
>
> I am using 0.27.0.  Is there any diagnosis tool or command line that i can
> run to ascertain that why its happening?
>
> Regards
> Sumit Chawla
>
>
> On Fri, May 19, 2017 at 2:31 PM, Joseph Wu <jos...@mesosphere.io> wrote:
>
>> What version of Mesos are you using?  (Just based on the word "slave" in
>> that error message, I'm guessing 0.28 or older.)
>>
>> The "Failed to synchronize" error is something that can occur while the
>> agent is launching the executor.  During the launch, the agent will create
>> a pipe to the executor subprocess; and the executor makes a blocking read
>> on this pipe.  The agent will write a value to the pipe to signal the
>> executor to proceed.  If the agent restarts or the pipe breaks at this
>> point in the launch, then you'll see this error message.
>>
>> On Thu, May 18, 2017 at 9:44 PM, Chawla,Sumit <sumitkcha...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I am facing a peculiar issue on one of the slave nodes of our cluster.
>>> I have a spark cluster with 40+ nodes.  On one of the nodes, all tasks fail
>>> with exit code 0.
>>>
>>> ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76
>>> exited caused by one of the running tasks) Reason: Unknown executor
>>> exit code (0)
>>>
>>>
>>> I cannot seem to find anything in mesos-slave.logs, and there is nothing
>>> being written to stdout/stderr.  Are there any debugging utitlities that i
>>> can use to debug what can be getting wrong on that particular slave?
>>>
>>> I tried running following but got stuck at:
>>>
>>>
>>> /mesos-containerizer launch 
>>> --command='{"environment":{},"shell":true,"value":"ls
>>> -ltr"}' --directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-b6eb-8f
>>> a4d2539da7-S77/frameworks/e6745c67-32e8-41ad-b6eb-8fa4d2539d
>>> a7-0312/executors/e6745c67-32e8-41ad-b6eb-8fa4d2539da7-
>>> S77/runs/45aa784c-f485-46a6-aeb8-997e82b80c4f --help=false
>>> --pipe_read=0 --pipe_write=0 --user=smi
>>>
>>> Failed to synchronize with slave (it's probably exited)
>>>
>>>
>>> Would apprecite pointing to any debugging methods/documentation to
>>> diagnose these kind of problems.
>>>
>>> Regards
>>> Sumit Chawla
>>>
>>>
>>
>


Re: Mesos Executor Failing

2017-05-19 Thread Joseph Wu
What version of Mesos are you using?  (Just based on the word "slave" in
that error message, I'm guessing 0.28 or older.)

The "Failed to synchronize" error is something that can occur while the
agent is launching the executor.  During the launch, the agent will create
a pipe to the executor subprocess; and the executor makes a blocking read
on this pipe.  The agent will write a value to the pipe to signal the
executor to proceed.  If the agent restarts or the pipe breaks at this
point in the launch, then you'll see this error message.

On Thu, May 18, 2017 at 9:44 PM, Chawla,Sumit 
wrote:

> Hi
>
> I am facing a peculiar issue on one of the slave nodes of our cluster.  I
> have a spark cluster with 40+ nodes.  On one of the nodes, all tasks fail
> with exit code 0.
>
> ExecutorLostFailure (executor e6745c67-32e8-41ad-b6eb-8fa4d2539da7-S76
> exited caused by one of the running tasks) Reason: Unknown executor exit
> code (0)
>
>
> I cannot seem to find anything in mesos-slave.logs, and there is nothing
> being written to stdout/stderr.  Are there any debugging utitlities that i
> can use to debug what can be getting wrong on that particular slave?
>
> I tried running following but got stuck at:
>
>
> /mesos-containerizer launch 
> --command='{"environment":{},"shell":true,"value":"ls
> -ltr"}' --directory=/var/tmp/mesos/slaves/e6745c67-32e8-41ad-
> b6eb-8fa4d2539da7-S77/frameworks/e6745c67-32e8-41ad-
> b6eb-8fa4d2539da7-0312/executors/e6745c67-32e8-41ad-
> b6eb-8fa4d2539da7-S77/runs/45aa784c-f485-46a6-aeb8-997e82b80c4f
> --help=false --pipe_read=0 --pipe_write=0 --user=smi
>
> Failed to synchronize with slave (it's probably exited)
>
>
> Would apprecite pointing to any debugging methods/documentation to
> diagnose these kind of problems.
>
> Regards
> Sumit Chawla
>
>


CMake and (eventually) deprecating the autotools build

2017-03-14 Thread Joseph Wu
Hi Devs!

The CMake build system for Mesos is now complete enough for wider
consumption.  The plan is to review all the differences between the
CMake and Autotools build systems and eventually deprecate the
Autotools build system.

A few of us are already using CMake exclusively for development.  But
we'd like to have more developers using it *before we start talking
about deprecation*.


Here is a summary of the known differences:

Missing features:
* CMake does not build Java artifacts at the moment.  Since the most
widely-used frameworks (Aurora, Marathon, etc) rely on this, we will
prioritize getting this done.
* CMake currently does not let you specify the exact system dependency
to use.  i.e. --with-ssl=... --with-boost=... etc.  Instead, CMake
either uses the bundled versions or automatically finds the system
locations.  This is a blocker for CMake adoption by DC/OS.
* CMake does not have an install target at the moment.  One of the top
priority things to get done.
* CMake does not build the port isolator module at the moment.
* CMake does not have an option to install the module dependencies at
the moment.
* CMake does not work on FreeBSD at the moment.

Features left out on purpose:
* CMake does not generate artifacts for Python.  We feel the Autotools
deprecation will likely run near/alongside the push towards using the
V1 HTTP APIs.  And there is already an HTTP API library for Python:
https://github.com/douban/pymesos
* CMake does not build the old CLI executables (src/cli/mesos.cpp and
src/cli/resolve.cpp) under the assumption that we will replace those
in the near future.
* CMake does not support installing test binaries, because the feature
appears to be unused.

New features:
* CMake builds on Windows!
* CMake supports packaging sources.  For example, you can do `cmake ..
&& make package_source` to generate the autotools equivalent of `make
dist`.
* CMake supports packaging binaries.  For example:

  * To generate debs and rpms: `cmake .. -DCPACK_BINARY_DEB=1
-DCPACK_BINARY_RPM=1 && make package`
  * On Windows, to build a graphical installer: `cmake ..
-DCPACK_BINARY_NSIS=1 && make package`
  * On OSX, to build .dmg and interactive installers: `cmake ..
-DCPACK_BINARY_OSXX11=1` and `-DCPACK_BINARY_DRAGNDROP=1 && make
package`

* More granular build targets.  For example, if you're working on
libprocess, you can use `make libprocess-tests` instead of babysitting
`make check`.
* [Upcoming] Precompiled headers, which should speed up the build dramatically.
* [Upcoming] We will be combining some aspects of Mesosphere's OSS
packaging repo [1] so that binary packages will contain service
definitions, as well as binaries.


Please let us know if you have any comments, concerns, or requests!

And please do try it out:
cmake .. && cmake --build .

The JIRA tracking the CMake build system is here:
https://issues.apache.org/jira/browse/MESOS-898

Thanks!
~Joseph


[1] https://github.com/mesosphere/mesos-deb-packaging


Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-07 Thread Joseph Wu
+1 (binding)

Deployed on a small-ish test cluster for about a week.  Monitoring of
that test cluster has not caught any problems with Mesos.

Also confirmed that this SSL socket FD leak does not affect Mesos,
except in tests: https://issues.apache.org/jira/browse/MESOS-6919

On Mon, Mar 6, 2017 at 9:52 AM, Jie Yu  wrote:
> -0
>
> I wanna fix MESOS-7208  
> which
> affects all tasks that are launched as non-root using container image.
>
> But this is not a new regression because it exists in 1.1.0, thus I am a -0
>
> - Jie
>
> On Fri, Mar 3, 2017 at 4:08 PM, Vinod Kone  wrote:
>
>> +1 (binding)
>>
>> Since the perf and flaky test that I reported earlier doesn't seem to be
>> blockers.
>>
>> On Fri, Mar 3, 2017 at 4:01 PM, Adam Bordelon  wrote:
>>
>> > I haven't heard any -1's so I'm going to go ahead and vote myself, from a
>> > DC/OS perspective:
>> >
>> > +1 (binding)
>> >
>> > I ran 1.2.0-rc2 through the DC/OS integration tests on top of the
>> > 1.9.0-rc1, which covers many Mesos features and tests multiple
>> frameworks.
>> > See CI results of https://github.com/dcos/dcos/pull/1295
>> >
>> > This was then merged into DC/OS 1.9.0-rc2 which passed another suite of
>> > integration tests. Available for testing at https://dcos.io/releases/1.9
>> .
>> > 0-rc2/
>> >
>> >
>> > On Thu, Mar 2, 2017 at 12:02 AM, Adam Bordelon 
>> wrote:
>> >
>> >> TL;DR: No consensus yet. Let's extend the vote for a day or two, until
>> we
>> >> have 3 +1s or a legit -1.
>> >> During that time we can test further, and investigate any issues that
>> >> have shown up.
>> >>
>> >> Here's a summary of what's been reported on the 1.2.0-rc2 vote thread:
>> >>
>> >> - There was a perf core dump on ASF CI, which is not necessarily a
>> >> blocker:
>> >> MESOS-7160  Parsing of perf version segfaults
>> >>   Perhaps fixed by backporting MESOS-6982: PerfTest.Version fails on
>> >> recent Arch Linux
>> >>
>> >> - There were a couple of (known/unsurprising) flaky tests:
>> >> MESOS-7185  DockerRuntimeIsolatorTest.ROOT_INTERNET_CURL_
>> DockerDefaultEntryptRegistryPuller
>> >> is flaky
>> >> MESOS-4570  DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems
>> flaky.
>> >>
>> >> - If we were to have an rc3, the following Critical bugs could be
>> >> included:
>> >> MESOS-7050  IOSwitchboard FDs leaked when containerizer launch fails --
>> >> leads to deadlock
>> >> MESOS-6982  PerfTest.Version fails on recent Arch Linux
>> >>
>> >> - Plus doc updates:
>> >> MESOS-7188 Add documentation for Debug APIs to Operator API doc
>> >> MESOS-7189 Add nested container launch/wait/kill APIs to agent API
>> >> docs.
>> >>
>> >>
>> >> On Wed, Mar 1, 2017 at 11:30 AM, Neil Conway 
>> >> wrote:
>> >>
>> >>> The perf core dump might be addressed if we backport this change:
>> >>>
>> >>> https://reviews.apache.org/r/56611/
>> >>>
>> >>> Although my guess is that this isn't a severe problem: for some
>> >>> as-yet-unknown reason, running `perf` on the host segfaulted, which
>> >>> causes the test to fail.
>> >>>
>> >>> Neil
>> >>>
>> >>> On Wed, Mar 1, 2017 at 11:09 AM, Vinod Kone 
>> >>> wrote:
>> >>> > Tested on ASF CI.
>> >>> >
>> >>> > Saw 2 configurations fail. One was the perf core dump issue
>> >>> > . Other is a known
>> >>> (since
>> >>> > 0..28.0) flaky test with Docker fetcher plugin
>> >>> > .
>> >>> >
>> >>> > Withholding the vote until we know the severity of the perf core
>> dump.
>> >>> >
>> >>> >
>> >>> > *Revision*: b9d8202a7444d0d1e49476bfc9817eb4583beaff
>> >>> >
>> >>> >- refs/tags/1.1.1-rc2
>> >>> >
>> >>> > Configuration Matrix gcc clang
>> >>> > centos:7 --verbose --enable-libevent --enable-ssl autotools
>> >>> > [image: Success]
>> >>> > > >>> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
>> >>> bose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>> >>> 1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>> >>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> >>> > [image: Not run]
>> >>> > cmake
>> >>> > [image: Success]
>> >>> > > >>> ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>> >>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>> >>> 20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoo
>> >>> p)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> >>> > [image: Not run]
>> >>> > --verbose autotools
>> >>> > [image: Success]
>> >>> > > >>> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
>> >>> bose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,
>> >>> 

Note about ".proto" files from Mesos 1.3.0+

2017-02-16 Thread Joseph Wu
Hi devs/contributors,

The next time you checkout HEAD and open a .proto file, you may notice this
line at the top of the file (after the Apache license, of course):

syntax = "proto2";

This has been added to all our protobufs in order to allow different
versions of the protobuf compiler to process our protobufs.  This change *does
not *change anything about the generated code, or the wire format, or
anything else.  The new line purely addresses a warning printed by protoc.

If you need to add any new protobufs, make sure you add the "syntax = ..."
in future.

See this and related issues for some more details:
https://issues.apache.org/jira/browse/MESOS-6138

~Joseph


Re: Removing `support/apply-review.sh`

2017-01-13 Thread Joseph Wu
+1 for one less character to type while tab-completing:
support/ap  s

On Fri, Jan 13, 2017 at 10:15 AM, Vinod Kone  wrote:

> +1 to remove
>
> On Fri, Jan 13, 2017 at 1:39 AM, haosdent  wrote:
>
> > +1 for remove this.
> >
> > On Fri, Jan 13, 2017 at 5:37 PM, Michael Park  wrote:
> >
> > > Does anyone still care about `support/apply-review.sh`?
> > > I imagine most people have transitioned to `support/apply-reviews.py`.
> > >
> > > Please let me know if people still want it around for some reason.
> > >
> > > Thanks,
> > >
> > > MPark
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>


Re: [VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-11-29 Thread Joseph Wu
AlexR,

Thanks for pointing out those test failures.  As of 0.28, the
LinuxFilesystemIsolatorTests were notoriously flaky on distributions with
"large" root filesystems.  The test would essentially copy the root
filesystem, leading to timeouts in multiple places in the tests.  CentOS 7
was known to have at least twice as much stuff to copy compared to the
other distributions (not sure about Fedora 23).

Looking at your logs (and logs you didn't attach), we see that a couple of
the tests that exercise the same code path did in fact pass, while others
timed out.  I wouldn't consider that a regression.

On Mon, Nov 28, 2016 at 12:54 PM, Vinod Kone  wrote:

> +1 (binding)
>
> Tested on ASF CI.
>
>
> *Revision*: 52a0b0a41482da35dc736ec2fd445b6099e7a4e7
>
>- refs/tags/0.28.3-rc1
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> 
> [image: Not run]
> cmake
> [image: Success]
> 
> [image: Not run]
> --verbose autotools
> [image: Success]
> 
> [image: Not run]
> cmake
> [image: Success]
> 
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> 
> [image: Success]
> 
> cmake
> [image: Success]
> 
> [image: Success]
> 
> --verbose autotools
> [image: Success]
> 
> [image: Success]
> 
> cmake
> [image: Success]
> 
> [image: Success]
> 
>
> On Mon, Nov 28, 2016 at 3:14 AM, Alex Rukletsov 
> wrote:
>
>> I see LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem failing on
>> CentOS 7 and Fedora 23, see e.g., [1]. I don't see any backports touching
>> [2], can it be a regression or this test is know to be problematic in
>> 0.28.x?
>>
>> [1] http://pastebin.com/c5PzfGF8
>> [2]
>> https://github.com/apache/mesos/blob/0.28.x/src/tests/contai
>> nerizer/filesystem_isolator_tests.cpp
>>
>> On Thu, Nov 24, 2016 at 

Re: Attendance for Mesos Developer Community Meeting (Nov 17)

2016-11-16 Thread Joseph Wu
+0.9

Is there an agenda in case there are enough attendees?

On Wed, Nov 16, 2016 at 3:15 PM, James Peach  wrote:

>
> > On Nov 16, 2016, at 3:06 PM, Michael Park  wrote:
> >
> > If you're planning to attend this meeting, please reply to this before
> Nov
> > 17 8am PST. If there are less than 5 people planning to attend (including
> > me), we'll skip it.
>
> +1
>
> >
> > On Wed, Nov 16, 2016 at 11:02 AM, Haripriya Ayyalasomayajula <
> > aharipriy...@gmail.com> wrote:
> >
> >> +1.
> >>
> >> On Wed, Nov 16, 2016 at 10:58 AM, Michael Park 
> wrote:
> >>
> >>> Many people will be in China for MesosCon, so I'd like to get a quick
> >> count
> >>> for how many people are planning to join the developer community
> meeting
> >>> tomorrow.
> >>>
> >>> Please reply with a +1 if you're planning to attend.
> >>>
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Haripriya Ayyalasomayajula
> >>
>
>


Re: Mesos V1 Operator HTTP API - Java Proto Classes

2016-11-16 Thread Joseph Wu
Added.  Welcome to the contributors list :)

On Wed, Nov 16, 2016 at 9:49 AM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com> wrote:

> I have created a JIRA and will submit a patch. Could someone please add me
> to the contributor list as I am not able to assign the JIRA to myself?
>
> https://issues.apache.org/jira/browse/MESOS-6597
>
>
>
>
> On Wednesday, November 16, 2016 9:00 AM, Anand Mazumdar 
> wrote:
>
>
> We wanted to move the project away from officially supporting anything
> other than C++ and discuss more on if we should be responsible for
> publishing to the various language specific channels. However, for the time
> being, we had decided to include the v1 protobufs in the mesos JAR itself.
> (it already contains the v1 Scheduler/Executor protos)
>
> Please file an issue as Zameer pointed out.
>
> -anand
>
> On Wed, Nov 16, 2016 at 8:34 AM, Zameer Manji  wrote:
>
> > I think this is a bug, I feel the jar should include all v1 protobuf
> files.
> >
> > Vijay, I encourage you to file a ticket.
> >
> > On Tue, Nov 15, 2016 at 8:04 PM, Vijay Srinivasaraghavan <
> > vijikar...@yahoo.com.invalid> wrote:
> >
> >> I believe the HTTP API will use the same underlying message format
> (proto
> >> def) and hence the request/response value objects (java) needs to be
> >> auto-generated from the proto files for it to be used in Jersey based
> java
> >> rest client?
> >>
> >>On Tuesday, November 15, 2016 12:37 PM, Tomek Janiszewski <
> >> jani...@gmail.com> wrote:
> >>
> >>
> >>  I suspect jar is deprecated and includes only old API used by mesoslib.
> >> The
> >> goal is to create HTTP API and stop supporting native libs (jars, so,
> >> etc).
> >> I think you shouldn't use that jar in your project.
> >>
> >> wt., 15.11.2016, 20:38 użytkownik Vijay Srinivasaraghavan <
> >> vijikar...@yahoo.com> napisał:
> >>
> >> > Hello,
> >> >
> >> > I am writing a rest client for "operator APIs" and found that some of
> >> the
> >> > protobuf java classes (like "include/mesos/v1/quota/quota.proto",
> >> > "include/mesos/v1/master/master.proto") are not included in the mesos
> >> jar
> >> > file. While investigating, I have found that the "Make" file does not
> >> > include these proto definition files.
> >> >
> >> > I have updated the Make file and added the protos that I am interested
> >> in
> >> > and built a new jar file. Is there any reason why these proto
> >> definitions
> >> > are not included in the original build apart from the reason that the
> >> APIs
> >> > are still evolving?
> >> >
> >> > Regards
> >> > Vijay
> >> >
> >>
> >> --
> >> Zameer Manji
> >>
> >
>
>
>


Re: Two questions about running spark on mesos

2016-11-14 Thread Joseph Wu
1) You should read through this page:
https://spark.apache.org/docs/latest/running-on-mesos.html
I (Mesos person) can't answer any questions that aren't already answered on
that page :)

2) Your normal spark commands (whatever they are) should still work
regardless of the backend.

On Mon, Nov 14, 2016 at 2:58 AM, Yu Wei  wrote:

> Hi Guys,
>
>
> Two questions about running spark on mesos.
>
> 1, Does spark configuration of conf/slaves still work when running spark
> on mesos?
>
> According to my observations, it seemed that conf/slaves still took
> effect when running spark-shell.
>
> However, it doesn't take effect when deploying in cluster mode.
>
> Is this expected behavior?
>
>Or did I miss anything?
>
>
> 2, Could I kill submitted jobs when running spark on mesos in cluster mode?
>
> I launched spark on mesos in cluster mode. Then submitted a long
> running job succeeded.
>
> Then I want to kill the job.
>
> How could I do that? Is there any similar commands as launching spark
> on yarn?
>
>
>
>
> Thanks,
>
> Jared, (??)
> Software developer
> Interested in open source software, big data, Linux
>


Re: 答复: Mesos Documentation Project

2016-11-09 Thread Joseph Wu
Because of the heavy amount of markdown changes, this proposal will live in
a Github PR [1], which is presumably reachable.  We may also consider, as
part of this project, migrating design documents for existing features off
GoogleDocs and into markdown, so that they can live in the documentation
too.

That being said, you can usually use a VPN to bypass the "network
limitations" ;)

[1] https://github.com/apache/mesos/pull/178

On Wed, Nov 9, 2016 at 5:03 PM, pangbingqiang <pangbingqi...@huawei.com>
wrote:

> It is really cool, but now mesos desigin doc on googledoc, china users can
> not enter in to download because of network limitations.
>
> -邮件原件-
> 发件人: James Neiman [mailto:jneima...@mesosphere.io]
> 发送时间: 2016年11月10日 8:30
> 收件人: dev@mesos.apache.org
> 主题: Mesos Documentation Project
>
> Dear Mesos Users, Operators, Developers, and Contributors:
>
> My name is James Neiman. I have been working with Benjamin Hindman, Artem
> Harutyunyan, Neil Conway, and Joseph Wu on improving the Mesos
> documentation. We now have a proposal for the community to critique.
>
> Our goal is to satisfy the needs of Operators, Developers, and Contributors
> by:
>
>- Revising, restructuring, and expanding existing topics.
>- Authoring new topics, such as *Quick Start* and *What is Mesos?*.
>- Reorganizing the table of contents.
>- Providing role-specific views of the table of contents.
>
>*Please note that versioning of the documentation will be addressed in
> a separate project.*
>
> This will be an iterative process. Your feedback and contributions are
> very important to making this project a success!
>
> I will follow up very soon with a request for your comments on proposed
> changes. I look forward to your feedback.
>
> Sincerely,
>
>
> James Neiman
>
> Technical Writing Consultant, Mesosphere
>
> jneima...@mesosphere.io
>


Re: 0.28.3 release dashboard!

2016-11-07 Thread Joseph Wu
Thanks for the suggestions Benjamin!

I've re-purposed one of the dashboard queries to track "Issues affecting
0.28.x that are resolved in versions later than 0.28".
https://issues.apache.org/jira/issues/?filter=12338701
^ That will show up on the dashboard too.

There are 26 issues in that list, which we'll triage.  I suspect most of
these will not be backported if they aren't already.

On Mon, Nov 7, 2016 at 5:31 AM, Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi Joseph and Anand,
>
> > We are planning to cut this patch release within three workdays - that
> would be around Monday next week. So, if you have any patches that need to
> get into 0.28.3 make sure that either it is already in the 0.28.x branch or
> the corresponding ticket has a target version set to 0.28.3.
>
> There are still a number of rather unpleasant issues filed against 0.28
> which are only fixed in versions > 0.28.3.
>
>   https://issues.apache.org/jira/browse/MESOS-5224
>   https://issues.apache.org/jira/browse/MESOS-5685
>   https://issues.apache.org/jira/browse/MESOS-5727
>   https://issues.apache.org/jira/browse/MESOS-5763
>   https://issues.apache.org/jira/browse/MESOS-6391
>
> Maybe it would be worthwhile to backport some of these.
>
> FYI, I used the following query which still required some manual filtering:
>
> project = Mesos AND \
> affectedVersion in (0.28, 0.28.0, 0.28.1, 0.28.2, 0.28.3) AND \
> (fixVersion not in (0.28.3) OR fixVersion < 0.28.3) AND \
> status = Resolved and type = Bug
>
> This might to be like a worthwhile addition to patch release dashboards
> (if somebody with more JIRA foo could come up with an actually working
> query).
>
>
> Cheers,
>
> Benjamin


0.28.3 release dashboard!

2016-11-03 Thread Joseph Wu
Hi everyone!

Anand and I will be the Release Managers for 0.28.3!

We are planning to cut this patch release within three workdays - that
would be around Monday next week. So, if you have any patches that need to
get into 0.28.3 make sure that either it is already in the 0.28.x branch or
the corresponding ticket has a target version set to 0.28.3.

The release dashboard:
https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12329818

Anand & Joseph


Re: Please add me as a contributor

2016-11-01 Thread Joseph Wu
Added!

On Tue, Nov 1, 2016 at 1:25 PM, Steven Locke  wrote:

> Hello,
>
> Please add me as a Mesos contributor to enable being assigned Jira issues.
>
> I signed up for Reviewboard and Jira as "slocke".
>
> Thanks,
> Steven
>
> --
> Steven Locke
> Software Engineering Intern
> www.mesosphere.com
>


Re: Need inputs on running MPI jobs on Mesos

2016-10-14 Thread Joseph Wu
Other than test frameworks or frameworks Mesos considers part of its CLI,
there shouldn't be any other Frameworks that are part of the Mesos
codebase.  (Imagine shipping Spark or Marathon or a bunch of other
humongous frameworks along with Mesos.)  Same thing goes for MPI, which may
or may not even work anymore.  I don't know anyone that has run the MPI
framework in the past several years.

On Fri, Oct 14, 2016 at 8:51 AM, Mangirish Wagle 
wrote:

> Thanks for the response.
> May I know if there are any reasons for not continuing to develop and
> support MPI framework? Are there any known issues with running MPI jobs on
> Mesos?
>
> Best Regards,
> Mangirish
>
> On Fri, Oct 14, 2016 at 2:20 AM, haosdent  wrote:
>
> > Refer to https://issues.apache.org/jira/browse/MESOS-6084, I think the
> MPI
> > framework would be deprecated.
> >
> > On Fri, Oct 14, 2016 at 1:57 PM, Mangirish Wagle <
> vaglomangir...@gmail.com
> > >
> > wrote:
> >
> > > Hello Mesos Devs,
> > >
> > > I am contributing to Apache Airavata  and
> > > currently working on extending the support for the science gateways to
> > run
> > > MPI jobs on cloud based Mesos clusters.
> > >
> > > I am looking at mpiexec-mesos
> > >  and Mesos Hydra
> > >  but I am also interested
> in
> > > knowing about any latest work that is being done in this area. In
> > general,
> > > I want to seek your advice and thoughts on what is the right tool that
> I
> > > should use, and the appropriate direction to proceed to achieve the
> > > objective of running MPI jobs on Mesos.
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Mangirish Wagle
> > > Graduate Student, Indiana University Bloomington.
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>


Re: Maintenance API question

2016-08-31 Thread Joseph Wu
The maintenance endpoints do not reject any "machine_ids".  They only
reject ones that are formatted wrong or are missing fields.

On Wed, Aug 31, 2016 at 8:44 AM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> - Mail original -
> > De: "Joseph Wu" <jos...@mesosphere.io>
> > À: "dev" <dev@mesos.apache.org>
> > Envoyé: Mercredi 31 Août 2016 17:16:57
> > Objet: Re: Maintenance API question
> >
> > Most likely, the hostname and IP you've put into the "machine_Ids"
> > does not *exactly
> > match* the hostname and IP the agent is identifying itself as.
>
> in this case master should reject the request according to the
> documentation. Here it is accepted (200 OK in response and appears in
> maintenance/schedule  and maintenance/status
>
>   If in
> > doubt, you can check the master's /slaves endpoint.  Or, you can manually
> > set the hostname and IP when starting the agent.
>
>
> I took information for the master UI and it is the same.
>
> Maybe the issue is the fact I am on a single machine, so hostname and ip
> are the same for master and slave
>
> >
> > On Wed, Aug 31, 2016 at 3:16 AM, Olivier Sallou <olivier.sal...@irisa.fr
> >
> > wrote:
> >
> > > Hi,
> > > I am trying to use the /maintenance API for mesos slave
> maintenance/drain.
> > >
> > > I follow doc at http://mesos.apache.org/documentation/latest/
> maintenance/
> > >
> > > I use mesos 1.0.1 on a single machine (for dev).
> > >
> > > When scheduling a node using
> > >
> > >
> > >
> > > {
> > > "windows" : [
> > > {
> > > "machine_ids" : [
> > > { "hostname" : "tifenn.irisa.fr", "ip" : "127.0.0.1" }
> > > ],
> > > "unavailability" : {
> > > "start" : { "nanoseconds" : 14726373400 },
> > > "duration" : { "nanoseconds" : 36000 }
> > > }
> > > }
> > > ]
> > > }
> > >
> > >
> > >
> > >
> > > The start date is set in the recent past (setting to future did not
> > > change).
> > >
> > >
> > > I see in /maintenance/status
> > >
> > > {"draining_machines":[{"id":{"hostname":"tifenn.irisa.fr","
> > > ip":"127.0.0.1"}}]}
> > >
> > > However, the offers I receive do not contain the unavailibility
> parameter.
> > > I do not know if it is expected, but start/duration do not appear in
> > > maintenance/status result.
> > > I see in master logs: HTTP POST for /master/maintenance/schedule from
> > > 127.0.0.1:34858 with User-Agent='curl/7.43.0'
> > >
> > >
> > > I tried anyway to switch the node to maintenance (/maintenance/down)
> but I
> > > continue to receive offers for this slave. In status, I see my slave in
> > > machines_down:
> > >
> > > {"down_machines":[{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}]}
> > >
> > > I can see on master logs:
> > >
> > >
> > >
> > > I0831 12:12:37.568898 6428 http.cpp:381] HTTP POST for
> > > /master/machine/down from 127.0.0.1:34970 with
> User-Agent='curl/7.43.0'
> > >
> > > 
> > >
> > > Sending 1 offers to framework a559cd9e-3e58-4377-9e1a-
> c8f3d28d2318-
> > > (Go-Docker Mesos) at scheduler-41e42d1f-b8f8-473a-
> > > b460-6fab3a150915@127.0.1.1:43060
> > >
> > >
> > >
> > >
> > > Should something be set to enable maintenance in mesos ?
> > >
> > >
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > > Olivier
> > >
> >
>


Re: Maintenance API question

2016-08-31 Thread Joseph Wu
Most likely, the hostname and IP you've put into the "machine_Ids"
does not *exactly
match* the hostname and IP the agent is identifying itself as.  If in
doubt, you can check the master's /slaves endpoint.  Or, you can manually
set the hostname and IP when starting the agent.

On Wed, Aug 31, 2016 at 3:16 AM, Olivier Sallou 
wrote:

> Hi,
> I am trying to use the /maintenance API for mesos slave maintenance/drain.
>
> I follow doc at http://mesos.apache.org/documentation/latest/maintenance/
>
> I use mesos 1.0.1 on a single machine (for dev).
>
> When scheduling a node using
>
>
>
> {
> "windows" : [
> {
> "machine_ids" : [
> { "hostname" : "tifenn.irisa.fr", "ip" : "127.0.0.1" }
> ],
> "unavailability" : {
> "start" : { "nanoseconds" : 14726373400 },
> "duration" : { "nanoseconds" : 36000 }
> }
> }
> ]
> }
>
>
>
>
> The start date is set in the recent past (setting to future did not
> change).
>
>
> I see in /maintenance/status
>
> {"draining_machines":[{"id":{"hostname":"tifenn.irisa.fr","
> ip":"127.0.0.1"}}]}
>
> However, the offers I receive do not contain the unavailibility parameter.
> I do not know if it is expected, but start/duration do not appear in
> maintenance/status result.
> I see in master logs: HTTP POST for /master/maintenance/schedule from
> 127.0.0.1:34858 with User-Agent='curl/7.43.0'
>
>
> I tried anyway to switch the node to maintenance (/maintenance/down) but I
> continue to receive offers for this slave. In status, I see my slave in
> machines_down:
>
> {"down_machines":[{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}]}
>
> I can see on master logs:
>
>
>
> I0831 12:12:37.568898 6428 http.cpp:381] HTTP POST for
> /master/machine/down from 127.0.0.1:34970 with User-Agent='curl/7.43.0'
>
> 
>
> Sending 1 offers to framework a559cd9e-3e58-4377-9e1a-c8f3d28d2318-
> (Go-Docker Mesos) at scheduler-41e42d1f-b8f8-473a-
> b460-6fab3a150915@127.0.1.1:43060
>
>
>
>
> Should something be set to enable maintenance in mesos ?
>
>
>
>
> Thanks
>
>
>
>
> Olivier
>


Re: Protobuf long number JSON serialisation

2016-08-04 Thread Joseph Wu
This is not necessarily a bug, but I think we can safely extend our parsing
code to handle this case.

This is the method that would need to change:
https://github.com/apache/mesos/blob/e859d3ae8d8ff7349327b9e6a89edd6f98d2b7a1/3rdparty/stout/include/stout/protobuf.hpp#L433-L435

On Thu, Aug 4, 2016 at 4:04 PM, Anand Mazumdar  wrote:

> Tomek,
>
> Thanks for reporting this. Looks like a bug in our JSON -> Protobuf
> parsing code. Mind filing a JIRA issue?
>
> -anand
>
>
> > On Aug 4, 2016, at 2:04 PM, Tomek Janiszewski  wrote:
> >
> > Hi
> >
> > I have a problem with HTTP API. Proto2 does not specify JSON mappings but
> > Proto3 does and it recommend to map 64bit numbers as a string.
> > Unfortunately Mesos does not accepts strings in places of uint64 and
> return
> > 400 Bad Request error Failed to convert JSON into Call protobuf: Not
> > expecting a JSON string for field 'value'.
> > Is this by purpose or is this a bug?
> >
> > Best
> > Tomek
>
>


Re: Metrics for custom modules

2016-07-13 Thread Joseph Wu
As long as you're using libprocess to write your modules, you can add your
metrics via `process::metrics::add(...)`.  Those will be exposed via the
same old `/metrics/snapshot` endpoint.

On Wed, Jul 13, 2016 at 5:39 PM, Zhitao Li  wrote:

> Hi,
>
> I'm not sure whether this has been mentioned, but is it possible to reuse
> Mesos's metric system for custom modules?
>
> For context, we are planning to turn on some agent-side modules at many
> machines, but we face the question of how we'll monitor their behavior: we
> can reinvent some wheels to export metrics by ourselves (or probably even
> use libprocess code), but I wonder whether Mesos should provide an easier
> solution to allow metrics be reported to the same metrics system of
> master/agent (either merged with standard metrics, or reported separately).
>
> Any thought about this?
>
> --
> Cheers,
>
> Zhitao Li
>


Re: [Replicated Log] Enable Mesos to use etcd for replicated_log

2016-07-11 Thread Joseph Wu
Jay, I'll shepherd the two detector/contender/network module changes.

On Mon, Jul 11, 2016 at 2:44 AM, Jay JN Guo <guojian...@cn.ibm.com> wrote:

>
> Shuai,
>
> I think you are right. AFAIK, replicated_log interacts with zk in only
> following two ways:
>
> 1) replicated_log creates znode and maintain 'membership' via
> zookeeper::Group. It actually stores pid under zk_url/log_replicas.
> 2) replicated_log detects other replicas using pid created in previous step
> and store them in set pids [1]. This is done via ZookeeperNetwork.
>
> Other than these, replicas inter-communicate with others through protobuf
> processes. Since there's only one active master at a time, that master will
> be the proposer (coordinator) and others being acceptor. So there is no
> need for leader election in replicated_log.
>
> Please correct me if I'm wrong.
>
> Thanks,
> /J
>
> [1] https://github.com/apache/mesos/blob/master/src/log/network.hpp#L316
>
> Shuai Lin <linshuai2...@gmail.com> wrote on 07/11/2016 00:13:12:
>
> > From: Shuai Lin <linshuai2...@gmail.com>
> > To: dev <dev@mesos.apache.org>
> > Cc: Jie Yu <j...@mesosphere.io>, Kapil Arya <ka...@mesosphere.io>
> > Date: 07/11/2016 00:13
> > Subject: Re: [Replicated Log] Enable Mesos to use etcd for replicated_log
> >
> > >
> > > i.e. The MasterContender is the piece that decides the "coordinator" of
> the
> > > replicated log.
> >
> >
> > IINM master contender/detector is not related to replicated logs. The
> only
> > thing they have in common (when using zookeeper) is they both get the
> > zookeeper servers list from the `--zk` flag.
> >
> >
> > On Sat, Jul 9, 2016 at 1:54 AM, Joseph Wu <jos...@mesosphere.io> wrote:
> >
> > > Jay,
> > >
> > > (1) Looks like we missed this when we modularized the
> > > MasterDetector/Contender [1].  We need to expand on src/master/main.cpp
> a
> > > bit.
> > > Can you file a bug?  (cc: Kapil)  I can shepherd if Kapil doesn't have
> the
> > > cycles.
> > >
> > > (2) The bit of the replicated log which relies on ZK is a small portion
> > > called the ZookeeperNetwork [2].  The job of this component is to watch
> the
> > > ZK group for membership changes.  Log replication messages are
> broadcasted
> > > to all members in this "network abstraction".
> > > This is also a piece that needs to be modularized.  (Can you file
> another
> > > bug? :)
> > >
> > > (3) The replicated log is something stored locally on the master (i.e.
> > > LevelDB).  The network abstraction has some similarity with the
> > > MasterDetector, but those pieces are otherwise unrelated.
> > > i.e. The MasterContender is the piece that decides the "coordinator" of
> the
> > > replicated log.  But the replicated log uses it's own implementation of
> > > Paxos after the coordinator is chosen.
> > >
> > > [1] https://issues.apache.org/jira/browse/MESOS-4610
> > > [2]
> https://github.com/apache/mesos/blob/master/src/log/network.hpp#L107
> > >
> > > On Fri, Jul 8, 2016 at 9:25 AM, Avinash Sridharan
> <avin...@mesosphere.io>
> > > wrote:
> > >
> > > > +Jie
> > > >
> > > > I think replicated log uses ZK only for leader election. Hence,
> without
> > > ZK
> > > > the quorum is hard-coded to 1.
> > > >
> > > > For (#2), trying to understand what you mean by replicated log being
> > > > pluggable? You mean turning of replicated log on the Master for
> storing
> > > > Registrar information?
> > > >
> > > > On Fri, Jul 8, 2016 at 2:26 AM, Jay JN Guo <guojian...@cn.ibm.com>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > We are working on a Mesos module to substitute Zookeeper with Etcd.
> > > > > Contender and detector are done through modulerized interfaces,
> > > however,
> > > > > replicated_log is still coupled with ZK. Here are my questions:
> > > > >
> > > > > #1 What's the difference between replicated_log with/without ZK?
> > > Without
> > > > > flag --zk, Log is constructed with hardcoded quorum of 1. Does it
> > > assume
> > > > > master to be running in non-HA mode? Otherwise, we observed that
> znodes
> > > > are
> > > > > created in ZK to store log_replica information, does it help Paxos
> > > > > coordination in some way?
> > > > > #2 We hope to make replicated_log pluggable. Some code change need
> to
> > > > > happen in Mesos upstream (interface modulerization, extra flags,
> etc).
> > > So
> > > > > we wonder if someone could shepherd them? Also, it would be great
> if we
> > > > > could get some help on better understanding replicated_log
> internals.
> > > > > #3 Is there a plan to use replicated_log to do master
> contend/detect
> > > > > instead of ZK? If yes, what's the status?
> > > > >
> > > > > Your help and suggestions are highly appreciated!!
> > > > >
> > > > > Thanks,
> > > > > /Jay
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Avinash Sridharan, Mesosphere
> > > > +1 (323) 702 5245
> > > >
> > >
>


Re: [Replicated Log] Enable Mesos to use etcd for replicated_log

2016-07-08 Thread Joseph Wu
Jay,

(1) Looks like we missed this when we modularized the
MasterDetector/Contender [1].  We need to expand on src/master/main.cpp a
bit.
Can you file a bug?  (cc: Kapil)  I can shepherd if Kapil doesn't have the
cycles.

(2) The bit of the replicated log which relies on ZK is a small portion
called the ZookeeperNetwork [2].  The job of this component is to watch the
ZK group for membership changes.  Log replication messages are broadcasted
to all members in this "network abstraction".
This is also a piece that needs to be modularized.  (Can you file another
bug? :)

(3) The replicated log is something stored locally on the master (i.e.
LevelDB).  The network abstraction has some similarity with the
MasterDetector, but those pieces are otherwise unrelated.
i.e. The MasterContender is the piece that decides the "coordinator" of the
replicated log.  But the replicated log uses it's own implementation of
Paxos after the coordinator is chosen.

[1] https://issues.apache.org/jira/browse/MESOS-4610
[2] https://github.com/apache/mesos/blob/master/src/log/network.hpp#L107

On Fri, Jul 8, 2016 at 9:25 AM, Avinash Sridharan 
wrote:

> +Jie
>
> I think replicated log uses ZK only for leader election. Hence, without ZK
> the quorum is hard-coded to 1.
>
> For (#2), trying to understand what you mean by replicated log being
> pluggable? You mean turning of replicated log on the Master for storing
> Registrar information?
>
> On Fri, Jul 8, 2016 at 2:26 AM, Jay JN Guo  wrote:
>
> >
> >
> > Hi,
> >
> > We are working on a Mesos module to substitute Zookeeper with Etcd.
> > Contender and detector are done through modulerized interfaces, however,
> > replicated_log is still coupled with ZK. Here are my questions:
> >
> > #1 What's the difference between replicated_log with/without ZK? Without
> > flag --zk, Log is constructed with hardcoded quorum of 1. Does it assume
> > master to be running in non-HA mode? Otherwise, we observed that znodes
> are
> > created in ZK to store log_replica information, does it help Paxos
> > coordination in some way?
> > #2 We hope to make replicated_log pluggable. Some code change need to
> > happen in Mesos upstream (interface modulerization, extra flags, etc). So
> > we wonder if someone could shepherd them? Also, it would be great if we
> > could get some help on better understanding replicated_log internals.
> > #3 Is there a plan to use replicated_log to do master contend/detect
> > instead of ZK? If yes, what's the status?
> >
> > Your help and suggestions are highly appreciated!!
> >
> > Thanks,
> > /Jay
> >
>
>
>
> --
> Avinash Sridharan, Mesosphere
> +1 (323) 702 5245
>


Re: [Action Required] Stale Reviews

2016-07-06 Thread Joseph Wu
On a related note, we will also be looking at the (usually neglected)
GitHub PRs.  We've accumulated ~50 of them over time.

After making a quick scan of the list, it turns out we can close a majority
of these PRs by either directly closing the non-issues, or by committing
the small documentation changes they propose.

Here's a doc summarizing what we will be doing:
https://docs.google.com/document/d/1BxUFRCis_4One-_Eoi19xJ9NJejh1Zl4ZCLTUcUUESE/

Note: Direct access to the GitHub mirror is restricted, even to most
committers, which is one reason why stale PRs stick around :(

On Sun, Jul 3, 2016 at 1:14 AM, Alex Rukletsov  wrote:

> Joris, could we punt on this until after 1.0? Right now people focus on
> polishing things for the release and I would like to avoid any
> distractions.
>
> On Wed, Jun 29, 2016 at 8:25 PM, Joris Van Remoortere 
> wrote:
>
> > Your suggestion generally encompasses the spirit of what we will do after
> > we've given the community time to act on their own. The reason we will
> > likely go through them manually is that there will be some patches that
> > don't apply but for which the contributor would still like to resume
> work.
> > Ideally people going through their outbox will have more context for
> which
> > things definitely don't make sense to keep open, so the list of which I
> > will have to go through manually will be shorter ;-)
> > I think the right thing is to provide people time to take these actions
> > themselves.
> >
> > We will be going through review of the github pull requests (already a
> much
> > smaller list) in the upcoming week.
> > After that I hope the reviewboard list will be significantly shorter and
> we
> > will be able to go through reviews of the remaining patches with higher
> > confidence that we'll be able follow through on them with the
> contributor.
> >
> > On Wed, Jun 29, 2016 at 8:17 PM, Tomek Janiszewski 
> > wrote:
> >
> > > How about running CI on all reviews. If patch is stale it probably
> can't
> > be
> > > applied,  CI will post bad patch and if nobody do any action on that
> > review
> > > we can close it.
> > >
> > > śr., 29.06.2016, 18:26 użytkownik Joris Van Remoortere <
> jo...@apache.org
> > >
> > > napisał:
> > >
> > > > Hello developers,
> > > >
> > > > Over the last year we've accumulated a significant review backlog.
> Over
> > > the
> > > > past month it has been floating around ~600 reviews.
> > > >
> > > > It would be of great help if you could look through your personal
> list
> > > > (Dashboard -> Outgoing -> Open) and identify reviews that are *no
> > longer
> > > > relevant* or that you are *not actively working on*.
> > > >
> > > > Suggested actions:
> > > > *No longer relevant: *Please discard them with a message explaining
> > why.
> > > > For example a link to the JIRA that was resolved already.
> > > > *Not actively working on: *Please discard them with a note that you
> are
> > > not
> > > > actively working on this, but to please involve you if someone picks
> it
> > > up
> > > > in the future. A note in the JIRA referencing your discarded review
> > would
> > > > be much appreciated here. This way we can easily track previous
> effort.
> > > >
> > > > Remember, discarded doesn't mean deleted. It doesn't even mean this
> was
> > > not
> > > > accepted. It just means we're not currently working on it. This will
> > help
> > > > guide reviewers and new contributors to the active set we are all
> > working
> > > > on.
> > > >
> > > > Ideally as a community we can do this organically. After some time
> has
> > > > passed, we will go through and discard ones we think are categorized
> as
> > > > above with a note on how to re-open them.
> > > >
> > > > Thanks!
> > > > Joris
> > > >
> > >
> >
>


Re: Protobuf syntax version for Mesos

2016-06-13 Thread Joseph Wu
Looks like that is a warning in v3, see [1].  The same code in v2.6.1 is
[2], and does not have that warning.

[1]
https://github.com/google/protobuf/blob/088c5c491e7a1c95c7b8eb55f119a8a999c81dc1/src/google/protobuf/compiler/parser.cc#L547-L550
[2]
https://github.com/google/protobuf/blob/bba83652e1be610bdb7ee1566ad18346d98b843c/src/google/protobuf/compiler/parser.cc#L436-L444

On Mon, Jun 13, 2016 at 2:46 PM, Vinod Kone  wrote:

> Are you are using a v3 protoc compiler? Looks like the "syntax" keyword was
> introduced in
> https://github.com/google/protobuf/releases/tag/v3.0.0-alpha-2.
> Not sure if v2 protoc compilers recognize that keyword.
>
> On Mon, Jun 13, 2016 at 5:42 PM, Zhitao Li  wrote:
>
> > Hi,
> >
> > We are experimenting with HTTP API and tried to compile Mesos's .proto
> > files with our repo (with make commands like "protoc
> --proto_path=protobuf
> > --go_out=.gen protobuf/mesos/v1/scheduler/scheduler.proto", but we see
> > warnings like
> >
> > ```
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:547] No syntax
> > specified for the proto file: mesos/v1/scheduler/scheduler.proto.
> > Please use 'syntax
> > = "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> > (Defaulted
> > to proto2 syntax.)
> > ```
> >
> > Should we have proper `syntax` clauses in various .proto files? If not,
> > what's the reason?
> >
> > Thanks.
> >
> > --
> > Cheers,
> >
> > Zhitao Li
> >
>


Re: Code Quality Improvements for docker-compose-executor

2016-06-13 Thread Joseph Wu
I'm not sure what the community adoption of the docker-compose-executor [1]
is, but from a Mesos perspective, the repo will eventually be superceded by
"pod" support in Mesos itself [2] [3].

Also, you should try to contact the developers of docker-compose-executor
itself, as they might not be subscribed to this mailing list.

[1] https://github.com/mesos/docker-compose-executor
[2] https://issues.apache.org/jira/browse/MESOS-2449
[3] https://issues.apache.org/jira/browse/MESOS-2634

On Sun, Jun 12, 2016 at 8:38 AM,  wrote:

> Hello,
>
> I'd like to send you some pull requests to improve the maintainability of
> docker-compose-executor.
>
> My company - DevFactory - is sponsoring me to identify and fix code
> quality issues and improve unit test coverage in open source projects.
> DevFactory is obsessed with code quality and is providing its commercially
> available code quality improvement service for free to qualified
> open-source projects.
>
> If you are interested, please let me know and we will add it to our
> pipeline. Our first step will be to utilize tools like PMD, FindBugs and
> Sonar to identify the most important issues to fix. Once we fix them, we'll
> follow up with some pull requests.
>
> Thanks,
> M.Ezzat
>
>


Re: [Tech-debt] Introduce regex into Mesos

2016-06-10 Thread Joseph Wu
Same here.

Mesos currently requires GCC 4.8.1+.  Regex support was implemented in GCC
4.9.0, see [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631

On Fri, Jun 10, 2016 at 11:39 AM, Kevin Klues  wrote:

> By compiler errors, I mean "internal compiler errors"
>
> On Fri, Jun 10, 2016 at 11:38 AM, Kevin Klues  wrote:
> > I've run into compiler errors using simple regex stuff from the
> > standard library on our supported version of gcc.
> >
> > On Thu, Jun 9, 2016 at 7:30 PM, Klaus Ma  wrote:
> >> Hi team,
> >>
> >>
> >> We're discussing to introduce regex into Mesos when investigating
> MESOS-4627; so I'd like
> to ask whether anyone has experience on regex after C++11? for example,
> supported compiler, compatibility, performance and so on :).
> >>
> >>
> >> 
> >>
> >> Da (Klaus), Ma (??), PMP®| Advisory Software Engineer
> >> Platform DCOS Development & Support, STG, IBM GCG
> >> +86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me
> >>
> >> 
> >
> >
> >
> > --
> > ~Kevin
>
>
>
> --
> ~Kevin
>


Re: Round robin DNS zookeeper record

2016-05-25 Thread Joseph Wu
Mesos passes the list inside between "zk://" and the first "/" directly
into Zookeeper's C bindings.
I'm not familiar enough with the Zookeeper API to say for certain, but it
looks like this *does* support your round-robin scheme.
You can double check here:
https://github.com/apache/zookeeper/blob/release-3.4.8/src/c/src/zookeeper.c#L620-L650


As for your other questions:

1) When the ZK connection is lost, Mesos will re-resolve the address as of
MESOS-4546 (https://issues.apache.org/jira/browse/MESOS-4546).

2) Looks like ZK rotates between servers.  When one server returns an
error, it tries the next one, in a circle.
Increment code:
https://github.com/apache/zookeeper/blob/release-3.4.8/src/c/src/zookeeper.c#L1248
Connection code:
https://github.com/apache/zookeeper/blob/release-3.4.8/src/c/src/zookeeper.c#L1578-L1580

On Wed, May 25, 2016 at 3:50 PM, Zhitao Li  wrote:

> Hi,
>
> Can someone confirm whether the zookeeper library Mesos is using well
> supports round robin DNS?
>
> For example,  if I have a round robin DNS entry `zookeeper-mesos-dc` which
> resolves to five A records, would the --zk flag value `
> zk://zookeeper-mesos-dc:2181/mesos` work on master, agents and any
> framework using libmesos driver?
>
> Also:
> 1. What happens if one of the A records changes in the DNS record? Do I
> need to restart related Mesos processes?
> 2. What happens if one of the A records is not responsive (e.g. underlying
> zookeeper server is dead)? Is the zookeeper library capable to avoid
> the bad server?
>
> Some pointer for me to find out the answer individually is also greatly
> appreciated.
>
> Thanks!
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: 1.0 Release Candidate

2016-05-25 Thread Joseph Wu
I'm guessing you mean the "medium term" bullet point on the Roadmap (
https://cwiki.apache.org/confluence/display/MESOS/Roadmap):

>
>- Deprecate Docker containerizer (in favor of Unified containerizer w/
>Docker support)
>
> This was never meant to be done as part of the 1.0 release.  I'm sure the
folks working on the unified containerizer can tell you their exact plans.


On Wed, May 25, 2016 at 12:10 PM, Jeff Schroeder  wrote:

> Does this mean the work to deprecate the docker containerizer will be
> post-1.0, or have those plans changed?
>
>
> On Wednesday, May 25, 2016, Vinod Kone  wrote:
>
>> Hi folks,
>>
>> As discussed in the previous community sync, we plan to cut a release
>> candidate for our next release (1.0) early next week.
>>
>> 1.0 is mainly centered around new APIs for Mesos. Please take a look at
>> MESOS-338  for blocking
>> issues. We got some great design and testing feedback for the v1 scheduler
>> and executor APIs. Please do the same for the in-progress v1 operator API
>> 
>> .
>>
>> Since this is a 1.0, we would like to do the release a little
>> differently.
>>
>> First, the voting period for vetting the release candidate would be a few
>> weeks (2-3 weeks) instead of the typical 3 days.
>>
>> Second, we are wiling to make major changes (scalability fixes, API
>> fixes) if there are any issues reported by the community.
>>
>> We are doing these because we really want the community to thoroughly
>> test the 1.0 release and give feedback.
>>
>> Thanks,
>>
>
>
> --
> Text by Jeff, typos by iPhone
>


Re: 答复: mesos-logrotate-logger binary package problem?

2016-05-17 Thread Joseph Wu
The particular implementation of the container logger packaged with Mesos
does not need either option (but it shouldn't break with either option
either).

"create" is not necessary because the container logger will create the log
file when it's missing.
https://github.com/apache/mesos/blob/2e409a5257cf4d53040959cae0edfbd7cf1a1af9/src/slave/container_loggers/logrotate.cpp#L145-L151

"copytruncate" is not necessary because the container logger calls
logrotate after it closes the FD.  (We do this because renaming a file is
faster than copying it.)
https://github.com/apache/mesos/blob/2e409a5257cf4d53040959cae0edfbd7cf1a1af9/src/slave/container_loggers/logrotate.cpp#L178-L184


On Mon, May 16, 2016 at 11:00 PM, pangbingqiang 
wrote:

> Thank you for your replay. In "parameters" option the
> {
> "key": "logrotate_stdout_options",
> "value": "rotate 7\nmissingok\ncompress\ndelaycompress"
> },
> The system call logrotate must set a option "create" or "copytruncate",
> which mode mesos set ?
> If choose "copytruncate" mode, it may be lost some log?
>
>
> -邮件原件-
> 发件人: Shuai Lin [mailto:linshuai2...@gmail.com]
> 发送时间: 2016年5月5日 19:23
> 收件人: dev
> 主题: Re: mesos-logrotate-logger binary package problem?
>
> That program is not meant to be executed manually. Instead you should
> configure the logger in the modules flags of your mesos slave.
>
> Here is a snippets of a script I used when testing the logrotate logger,
> you can adjust for your own use case:
>
> MODULES_JSON_FILE=/tmp/modules.json
> cat >$MODULES_JSON_FILE< {
> "libraries": [
> {
> "file":
> "${MESOS_BUILD_DIR}/src/.libs/liblogrotate_container_logger.so",
> "modules": [
> {
> "name": "org_apache_mesos_LogrotateContainerLogger",
> "parameters": [
> {
> "key": "launcher_dir",
> "value": "${MESOS_BUILD_DIR}/src/"
> },
> {
> "key": "max_stdout_size",
> "value": "4096B"
> },
> {
> "key": "max_stderr_size",
> "value": "4096B"
> },
> {
> "key": "logrotate_stdout_options",
> "value": "rotate 7\nmissingok\ncompress\ndelaycompress"
> },
> {
> "key": "logrotate_stderr_options",
> "value": "rotate 7\nmissingok\ncompress\ndelaycompress"
> }
> ]
> }
> ]
> }
> ]
> }
> EOF
>
>
> ${MESOS_SLAVE} \
>   --hostname=localhost \
>   --ip=127.0.0.1 \
>   --master=127.0.0.1:5050 \
>   --resources="cpus:2;mem:10240" \
>   --log_dir="${WORK_DIR}/slave/logs" \
>   --work_dir="${WORK_DIR}/slave" \
>   --launcher_dir="${MESOS_BUILD_DIR}/src/" \
>   --modules=${MODULES_JSON_FILE} \
>   --container_logger=org_apache_mesos_LogrotateContainerLogger \
>   --containerizers=docker,mesos
>
> Regards,
> Shuai
>
>
>
>
>
> On Thu, May 5, 2016 at 4:10 PM, pangbingqiang 
> wrote:
>
> > Hi all:
> >
> >   When I alone use “mesos-logrotate-logger” binary package, it will
> > always error, the log as:
> >
> > “Failed to put child in a new session: Operation not permitted”.
> >
> > I find it don’t’t call in mesos, so how to use if for log rotate?
> >
> >
> >
> > [image: cid:image001.png@01D0E8C5.8D08F440]
> >
> >
> >
> > Bingqiang Pang 00278970
> >
> >
> >
> > Distributed and Parallel Software Lab
> >
> > Huawei Technologies Co., Ltd.
> >
> > Email:pangbingqi...@huawei.com 
> >
> >
> >
> >
> >
>


Re: [RESULT][VOTE] Release Apache Mesos 0.27.2 (rc1)

2016-03-19 Thread Joseph Wu
Cong Wang,

The tags are sync'd.  See: https://github.com/apache/mesos/releases

You might not have done: git pull --tags

On Wed, Mar 16, 2016 at 11:49 AM, Cong Wang  wrote:

> On Mon, Mar 7, 2016 at 8:29 PM, Michael Park  wrote:
> > Please find the release at:
> > https://dist.apache.org/repos/dist/release/mesos/0.27.2
> >
> > It is recommended to use a mirror to download the release:
> > http://www.apache.org/dyn/closer.cgi
> >
> > The CHANGELOG for the release is available at:
> >
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.27.2
> >
> > The mesos-0.27.2.jar has been released to:
> > https://repository.apache.org
> >
>
> So why the git tags are not synced to github mirror?
>
> $ git tag -l | grep '0\.27\.2'
>


Re: Recent changes to MesosTest helpers

2016-03-19 Thread Joseph Wu
We tried to reduce segfaults of this particular pattern (de-referencing
out-of-scope stack variables), as much as possible.  This means the test
suite shouldn't crash due to flaky tests anymore.  And the test suite
should run to completion each time.

(I also replaced a bunch of CHECK_* statements in the tests with ASSERT_*.)

On Wed, Mar 16, 2016 at 8:27 AM, haosdent <haosd...@gmail.com> wrote:

> Does it exit like segment when CHECK_xxx failed? Or exit until finish all
> test cases?
> On Mar 16, 2016 11:03 PM, "Joseph Wu" <jos...@mesosphere.io> wrote:
>
> > Hello Devs & Contributors,
> >
> > We recently committed a refactor of the MesosTest suite and underlying
> > "Cluster" abstraction.  This affects almost every existing test and
> future
> > test, so here's a summary of what has changed and what you should be
> aware
> > of:
> >
> >- The purpose of the refactor is to make the entire test suite more
> >resilient to flaky tests.  Before, every test that used the "
> >MesosTest::StartMaster" and "MesosTest::StartSlave" helpers also
> needed
> >to have "Shutdown()" at the end of the test.  If the test failed an
> >assertion or expectation, it would exit before "Shutdown()" and would
> >very likely segfault, or hit a "__cxa_pure_virtual__" and exit with a
> >cryptic stack trace.
> >- The signatures of "MesosTest::StartMaster" and
> "MesosTest::StartSlave"
> >have changed.  Both test helpers now return a "
> >Try<Owned" Instead of a
> "Try<PID>".
> >To way to access the "PID" was changed from ".get()" to ".get()->pid".
> >- "Shutdown()" has been removed from MesosTest.  It is no longer
> >necessary.
> >- The MasterDetector has been exposed at the top-level for all slaves.
> >This slave dependency was originally populated by the "Cluster"
> > abstraction
> >(which held both Masters and Slaves).  In most cases, it will be
> > sufficient
> >to create the detector like:
> >
> >Owned detector = master->createDetector();
> >- If you need to restart the master in the middle of a test, just
> reset
> >the underlying "Owned" pointer.  i.e:
> >
> >master->reset();
> >master = StartMaster();
> >
> >Note: We can't assign master before resetting the pointer.  This is a
> >limitation related to supporting multiple masters in tests, which is
> >currently not possible.
> >- If you need to restart the slave in the middle of a test, there are
> >several ways:
> >   - To clean up any containers associated with that slave:
> >   slave = StartSlave(...);
> >
> >   Or:
> >   slave.reset();
> >   slave = StartSlave(...);
> >   - To stop a slave without container cleanup (equivalent to the
> >   original "MesosTest::Stop()"), use:
> >   slave->terminate();
> >
> >   Or:
> >   slave->shutdown();
> >
> >   These two methods emulate turning off the slave, but have slightly
> >   different semantics.  "Terminate" generally emulates a crash.
> > "Shutdown"
> >   emulates a graceful exit.
> >
> > If you have any further questions, feel free to ask.  There are still
> quite
> > a few improvements to make, but those will likely be less disruptive.
> >
> > ~Joseph
> >
>


Recent changes to MesosTest helpers

2016-03-18 Thread Joseph Wu
Hello Devs & Contributors,

We recently committed a refactor of the MesosTest suite and underlying
"Cluster" abstraction.  This affects almost every existing test and future
test, so here's a summary of what has changed and what you should be aware
of:

   - The purpose of the refactor is to make the entire test suite more
   resilient to flaky tests.  Before, every test that used the "
   MesosTest::StartMaster" and "MesosTest::StartSlave" helpers also needed
   to have "Shutdown()" at the end of the test.  If the test failed an
   assertion or expectation, it would exit before "Shutdown()" and would
   very likely segfault, or hit a "__cxa_pure_virtual__" and exit with a
   cryptic stack trace.
   - The signatures of "MesosTest::StartMaster" and "MesosTest::StartSlave"
   have changed.  Both test helpers now return a "
   Try".
   To way to access the "PID" was changed from ".get()" to ".get()->pid".
   - "Shutdown()" has been removed from MesosTest.  It is no longer
   necessary.
   - The MasterDetector has been exposed at the top-level for all slaves.
   This slave dependency was originally populated by the "Cluster" abstraction
   (which held both Masters and Slaves).  In most cases, it will be sufficient
   to create the detector like:

   Owned detector = master->createDetector();
   - If you need to restart the master in the middle of a test, just reset
   the underlying "Owned" pointer.  i.e:

   master->reset();
   master = StartMaster();

   Note: We can't assign master before resetting the pointer.  This is a
   limitation related to supporting multiple masters in tests, which is
   currently not possible.
   - If you need to restart the slave in the middle of a test, there are
   several ways:
  - To clean up any containers associated with that slave:
  slave = StartSlave(...);

  Or:
  slave.reset();
  slave = StartSlave(...);
  - To stop a slave without container cleanup (equivalent to the
  original "MesosTest::Stop()"), use:
  slave->terminate();

  Or:
  slave->shutdown();

  These two methods emulate turning off the slave, but have slightly
  different semantics.  "Terminate" generally emulates a crash.  "Shutdown"
  emulates a graceful exit.

If you have any further questions, feel free to ask.  There are still quite
a few improvements to make, but those will likely be less disruptive.

~Joseph


Re: [VOTE] Release Apache Mesos 0.28.0 (rc1)

2016-03-08 Thread Joseph Wu
If we're re-cutting the release, can we also add this fix for maintenance?
(still under review)
https://reviews.apache.org/r/44258/

On Tue, Mar 8, 2016 at 2:43 PM, Kevin Klues  wrote:

> Here are the list of reviews/patches that have been called out in this
> thread for inclusion in 0.28.0-rc2.  Some of them are still under
> review and will need to land by Thursday to be included.
>
> Are there others?
>
> Jie's container image documentation (submitted):
> commit 7de8cdd4d8ed1d222fa03ea0d8fa6740c4a9f84b
> https://reviews.apache.org/r/44414
>
> Restore Mesos' ability to extract Docker assigned IPs (still under review):
> https://reviews.apache.org/r/43093/
>
> Fixed the logic for default docker cmd case (submitted).
> commit e42f740ccb655c0478a3002c0b6fa90c1144f41c
> https://reviews.apache.org/r/44468/
>
> Implemented runtime isolator default cmd test (still under review).
> https://reviews.apache.org/r/44469/
>
> Fixed a bug that causes the task stuck in staging state (still under
> review).
> https://reviews.apache.org/r/44435/
>
> On Tue, Mar 8, 2016 at 10:30 AM, Kevin Klues  wrote:
> > Yes, will do.
> >
> > On Tue, Mar 8, 2016 at 10:26 AM, Vinod Kone 
> wrote:
> >> +kevin klues
> >>
> >> OK. I'm cancelling this vote since there are some show stopper issues
> that
> >> we need to cherry-pick. I'll cut another RC on Thursday.
> >>
> >> @shepherds: can you please make sure the blocker tickets are marked with
> >> fix version and that they land today or tomorrow?
> >>
> >> @kevin: since you have volunteered to help with the release, can you
> make
> >> sure we have a list of commits to cherry pick for rc2?
> >>
> >> Thanks,
> >>
> >>
> >> On Tue, Mar 8, 2016 at 12:05 AM, Shuai Lin 
> wrote:
> >>
> >>> Maybe also https://issues.apache.org/jira/browse/MESOS-4877 and
> >>> https://issues.apache.org/jira/browse/MESOS-4878 ?
> >>>
> >>>
> >>> On Tue, Mar 8, 2016 at 9:13 AM, Jie Yu  wrote:
> >>>
>  I'd like to fix https://issues.apache.org/jira/browse/MESOS-4888 as
> well
>  if you guys plan to cut another RC
> 
>  On Mon, Mar 7, 2016 at 10:16 AM, Daniel Osborne <
>  daniel.osbo...@metaswitch.com> wrote:
> 
> > -1
> >
> > If it doesn’t cause too much pain, I'm hoping we can squeeze a
> > relatively small patch which restores Mesos' ability to extract
> Docker
> > assigned IPs. This has been broken with Docker 1.10's release over
> a month
> > ago, and prevents service discovery and DNS from working.
> >
> > Mesos-4370: https://issues.apache.org/jira/browse/MESOS-4370
> > RB# 43093: https://reviews.apache.org/r/43093/
> >
> > I've built 0.28.0-rc1 with this patch and can confirm that it fixes
> it
> > as expected.
> >
> > Apologies for not bringing this to attention earlier.
> >
> > Thanks all,
> > Dan
> >
> > -Original Message-
> > From: Vinod Kone [mailto:vinodk...@apache.org]
> > Sent: Thursday, March 3, 2016 5:44 PM
> > To: dev ; user 
> > Subject: [VOTE] Release Apache Mesos 0.28.0 (rc1)
> >
> > Hi all,
> >
> >
> > Please vote on releasing the following candidate as Apache Mesos
> 0.28.0.
> >
> >
> > 0.28.0 includes the following:
> >
> >
> >
> 
> >
> >   * [MESOS-4343] - A new cgroups isolator for enabling the net_cls
> > subsystem in
> >
> > Linux. The cgroups/net_cls isolator allows operators to provide
> > network
> >
> >
> > performance isolation and network segmentation for containers
> within
> > a Mesos
> >
> > cluster. To enable the cgroups/net_cls isolator, append
> > `cgroups/net_cls` to
> >
> > the `--isolation` flag when starting the slave. Please refer to
> >
> >
> > docs/mesos-containerizer.md for more details.
> >
> >
> >
> >
> >
> >   * [MESOS-4687] - The implementation of scalar resource values
> (e.g.,
> > "2.5
> >
> >
> > CPUs") has changed. Mesos now reliably supports resources with
> up to
> > three
> >
> > decimal digits of precision (e.g., "2.501 CPUs"); resources with
> > more than
> >
> > three decimal digits of precision will be rounded. Internally,
> > resource math
> >
> > is now done using a fixed-point format that supports three
> decimal
> > digits of
> >
> > precision, and then converted to/from floating point for input
> and
> > output,
> >
> > respectively. Frameworks that do their own resource math and
> > manipulate
> >
> >
> > fractional resources may observe differences in roundoff error
> and
> > numerical
> >
> > precision.
> 

[Proposal] Unified logging for containerizers & the external containerizer

2015-12-11 Thread Joseph Wu
Hello All,

As part of the work on managing the logs for executors and tasks, we're
introducing a "ContainerLogger" module.  This module will allow the
stdout/stderr of executors and tasks to be managed or redirected.
(Existing executor/task logs are written to plain files.)  For example:

   - The module would make it trivial to truncate logs to a maximum size.
   Or to rotate the logs.
   - A module could redirect logs into an aggregation service, like syslog
   or journald; or to external logging, like LogStash or Splunk.

See the epic for more details:
https://issues.apache.org/jira/browse/MESOS-4086

For the MVP, we will support the Mesos and Docker containerizers.  For the
external containerizer, we plan to exit if an agent is started with both
the external containerizer and the new ContainerLogger module.  i.e.

mesos-slave.sh --containerizers="mesos,external"
--container_logger="some_custom_logger"

Is there anyone, using the external containerizer, that would not prefer
this behavior?

Thanks,
~Joseph


[Breaking bug fix] Binary in state endpoints

2015-10-23 Thread Joseph Wu
Hello,

The state endpoints, on master and agent, currently serialize two binary
data fields in the ExecutorInfo and TaskInfo objects.  These fields are set
by frameworks; and Mesos does not inspect their values.

The data fields can be found in the state JSON blobs:
/master/state -> frameworks[*].executors[*].data
/slave/state ->
frameworks[*].(executors|completed_executors)[*].(tasks|queued_tasks|completed_tasks)[*].data

*Problem:*
The state endpoints are JSON-ified in a non-standard way (i.e. not via our
normal Protobuf-to-json methods).  When we serialize the binary "data"
fields, the binary is dumped as a string, as is.  The resulting JSON may
not be valid if the binary data includes random bytes (i.e. not unicode).
Most JSON parsers will error on the state endpoints in this case.

*Proposed solution *(and breaking change)*:*
Simple -- remove the "data" fields from the state endpoints.  (And only
from the state endpoints.  The ExecutorInfo and TaskInfo objects will not
change.)

*Question:*
We believe that frameworks/tools do not rely on retrieving the "data"
fields from the state endpoints.

Is there any framework/tool that retrieves the "data" field from the state
endpoints?
And if so, is it critical to how the framework/tool works?

More details here: https://issues.apache.org/jira/browse/MESOS-3771

Thanks,
~Joseph


Re: RFC: license headers interfere with doxygen documentation (MESOS-3581)

2015-10-20 Thread Joseph Wu
+/- 0 (a) wouldn't hurt, but isn't the best solution.


I'd vote for adding actual comment blocks to each class.  Doxygen takes the
comment block immediately preceding the class and uses that as the
description.  This means a file like this would show up correctly on
Doxygen:

/**
 * License ...
 */

#include <...>

/**
 * Bar!  <- This is what would show up on Doxygen.
 * A lot of our existing classes don't have a comment block
 * so Doxygen takes the License instead :(
 */
class Foo {
  ...
}

~Joseph

On Tue, Oct 20, 2015 at 2:32 PM, Marco Massenzio 
wrote:

> +1
> (and thanks for flagging this!)
>
> --
> *Marco Massenzio*
> Distributed Systems Engineer
> http://codetrips.com
>
> On Tue, Oct 20, 2015 at 12:14 PM, Joris Van Remoortere <
> jo...@mesosphere.io>
> wrote:
>
> > +1 for (a).
> >
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Tue, Oct 20, 2015 at 3:02 PM, Benjamin Mahler <
> > benjamin.mah...@gmail.com>
> > wrote:
> >
> > > +1 for (a), in this case the wide sweep only touches the license
> > comments,
> > > so it won't be disruptive to history.
> > >
> > > On Tue, Oct 20, 2015 at 11:59 AM, James Peach 
> wrote:
> > >
> > > >
> > > > > On Oct 20, 2015, at 8:55 AM, Bernd Mathiske 
> > > wrote:
> > > > >
> > > > > All, is changing every source code file prohibitive or not?
> > > > >
> > > > >> On Oct 20, 2015, at 10:01 AM, Benjamin Bannier <
> > > > benjamin.bann...@mesosphere.io> wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> I would like to ask for input on how we plan to fix (both short-
> and
> > > > longterm) the interference of the license headers and Doxygen
> > > documentation
> > > > (https://issues.apache.org/jira/browse/MESOS-3581).
> > > > >>
> > > > >> Currently, and in line with the respective guidelines, license
> > blocks
> > > > are wrapped in Javadoc-style comments which are also used for Doxygen
> > > > documentation. This leads to Doxygen interpreting license headers as
> > > > documentation for whatever entity follows them in the code, and
> heavily
> > > > clutters the generated documentation (see e.g.
> > > > http://mesos.apache.org/api/latest/c++/annotated.html). Given that
> > > > considerable effort is done to improve the documentation this
> > > unfortunate.
> > > > >>
> > > > >> * * *
> > > > >>
> > > > >> For a TLDR; of the Jira issue, there are two ways to fix this:
> > > > >>
> > > > >> (a) change *all* license headers to be wrapped in e.g. `/* .. */`,
> > > also
> > > > update the coding guidelines, or
> > > > >> (b) perform some preprocessor-like magic in the Doxygen layer.
> > > > >>
> > > > >> Option (a) is very noise but obvious and stable; option (b) OTOH
> > > > employs a simple but stupid text replacement under the covers
> codified
> > in
> > > > the Doxygen config; it might produce some artifacts and be surprising
> > > since
> > > > the code Doxygen sees will be different from what is in the source.
> > > > >>
> > > > >> I personally believe option (a) is superior for purely technical
> > > reasons
> > > >
> > > > +1 for (a); there's no value in showing license headers to doxygen or
> > > > tooling workarounds
> > > >
> > > > >> with option (b) a possible temporary workaround.
> > > > >>
> > > > >>
> > > > >> To make sure that the generated documentation shows actual
> > > > documentation content in overviews like
> > > > http://mesos.apache.org/api/latest/c++/annotated.html and elsewhere
> we
> > > > should fix this. Please comment in the Jira issue (
> > > > https://issues.apache.org/jira/browse/MESOS-3581) your input on how
> > you
> > > > think this should be fixed (short- and longterm).
> > > > >>
> > > > >>
> > > > >> Cheers,
> > > > >>
> > > > >> Benjamin
> > > > >
> > > >
> > > >
> > >
> >
>


Re: Patch for the website's Rakefile

2015-10-02 Thread Joseph Wu
Dave,

Would it be possible for you to take a look at the patches in MESOS-3183
<https://issues.apache.org/jira/browse/MESOS-3183>?

Ideally, we should fix the documentation before 0.25 goes out.

Thanks,
~Joseph

On Mon, Sep 28, 2015 at 1:59 PM, Joseph Wu <jos...@mesosphere.io> wrote:

> + Dev
>
>
> On Mon, Sep 28, 2015 at 1:56 PM, Dave Lester <dles...@twitter.com> wrote:
>
>> Can this be discussed on the mailing list? Thanks
>>
>>
>> On Monday, September 28, 2015, Joseph Wu <jos...@mesosphere.io> wrote:
>>
>>> + Niq, Joris, MPark (so that this doesn't get neglected when the website
>>> is updated for the 0.25 release).
>>>
>>> Both patches (for the website and for the docs) will be tracked here:
>>> https://issues.apache.org/jira/browse/MESOS-3183
>>>
>>> Feedback/reviews would be appreciated,
>>> ~Joseph
>>>
>>> On Tue, Sep 22, 2015 at 11:48 AM, Adam Bordelon <a...@mesosphere.io>
>>> wrote:
>>>
>>>> Unfortunately, we don't have RB synced up to our svn repo. Submitting
>>>> raw patches has been best practice so far (AFAIK)
>>>>
>>>> On Tue, Sep 22, 2015 at 11:35 AM, Joseph Wu <jos...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Looks like this has been an issue since the end of July (!).
>>>>> https://issues.apache.org/jira/browse/MESOS-3183
>>>>>
>>>>> Is there a reviewboard equivalent for modifications to the website
>>>>> like this patch?
>>>>> ~Joseph
>>>>>
>>>>> On Mon, Sep 21, 2015 at 2:56 PM, Vinod Kone <vi...@twitter.com> wrote:
>>>>>
>>>>>> +dave lester who previously looked into the image loading issue.
>>>>>>
>>>>>>
>>>>>> @vinodkone
>>>>>>
>>>>>> On Mon, Sep 21, 2015 at 2:50 PM, Joseph Wu <jos...@mesosphere.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Adam/Vinod,
>>>>>>>
>>>>>>> Documentation images aren't being published to the website.  A few
>>>>>>> places with image(s):
>>>>>>> *
>>>>>>> http://mesos.apache.org/documentation/latest/external-containerizer/
>>>>>>> * Or http://mesos.apache.org/documentation/latest/oversubscription/
>>>>>>> * Or http://mesos.apache.org/documentation/latest/maintenance/
>>>>>>> <-* ;(*
>>>>>>>
>>>>>>> I've attached a patch which sort-of fixes this.  It only sort-of
>>>>>>> works because the images are copied to the website, *BUT* for them
>>>>>>> to show up, you need to remove the trailing slash "/" from the URL.
>>>>>>>
>>>>>>> I'll submit a separate patch on RB for changing all image URLs from
>>>>>>> "images/*" to "/documentation/latest/images/*", so that they show up
>>>>>>> regardless of the slash:
>>>>>>> https://reviews.apache.org/r/38570/
>>>>>>>
>>>>>>> ~Joseph
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> @davelester
>> Open Source Advocate | Twitter, Inc
>> #MesosCon Europe Co-Chair
>>
>>
>


Re: Patch for the website's Rakefile

2015-09-28 Thread Joseph Wu
+ Dev

On Mon, Sep 28, 2015 at 1:56 PM, Dave Lester <dles...@twitter.com> wrote:

> Can this be discussed on the mailing list? Thanks
>
>
> On Monday, September 28, 2015, Joseph Wu <jos...@mesosphere.io> wrote:
>
>> + Niq, Joris, MPark (so that this doesn't get neglected when the website
>> is updated for the 0.25 release).
>>
>> Both patches (for the website and for the docs) will be tracked here:
>> https://issues.apache.org/jira/browse/MESOS-3183
>>
>> Feedback/reviews would be appreciated,
>> ~Joseph
>>
>> On Tue, Sep 22, 2015 at 11:48 AM, Adam Bordelon <a...@mesosphere.io>
>> wrote:
>>
>>> Unfortunately, we don't have RB synced up to our svn repo. Submitting
>>> raw patches has been best practice so far (AFAIK)
>>>
>>> On Tue, Sep 22, 2015 at 11:35 AM, Joseph Wu <jos...@mesosphere.io>
>>> wrote:
>>>
>>>> Looks like this has been an issue since the end of July (!).
>>>> https://issues.apache.org/jira/browse/MESOS-3183
>>>>
>>>> Is there a reviewboard equivalent for modifications to the website like
>>>> this patch?
>>>> ~Joseph
>>>>
>>>> On Mon, Sep 21, 2015 at 2:56 PM, Vinod Kone <vi...@twitter.com> wrote:
>>>>
>>>>> +dave lester who previously looked into the image loading issue.
>>>>>
>>>>>
>>>>> @vinodkone
>>>>>
>>>>> On Mon, Sep 21, 2015 at 2:50 PM, Joseph Wu <jos...@mesosphere.io>
>>>>> wrote:
>>>>>
>>>>>> Hi Adam/Vinod,
>>>>>>
>>>>>> Documentation images aren't being published to the website.  A few
>>>>>> places with image(s):
>>>>>> *
>>>>>> http://mesos.apache.org/documentation/latest/external-containerizer/
>>>>>> * Or http://mesos.apache.org/documentation/latest/oversubscription/
>>>>>> * Or http://mesos.apache.org/documentation/latest/maintenance/
>>>>>> <-* ;(*
>>>>>>
>>>>>> I've attached a patch which sort-of fixes this.  It only sort-of
>>>>>> works because the images are copied to the website, *BUT* for them
>>>>>> to show up, you need to remove the trailing slash "/" from the URL.
>>>>>>
>>>>>> I'll submit a separate patch on RB for changing all image URLs from
>>>>>> "images/*" to "/documentation/latest/images/*", so that they show up
>>>>>> regardless of the slash:
>>>>>> https://reviews.apache.org/r/38570/
>>>>>>
>>>>>> ~Joseph
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> @davelester
> Open Source Advocate | Twitter, Inc
> #MesosCon Europe Co-Chair
>
>