Re: Review Request 15653: Adds loadavg() convenience method to stout

2013-12-09 Thread Niklas Nielsen


 On Dec. 9, 2013, 9:42 p.m., Ben Mahler wrote:
  3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp, line 849
  https://reviews.apache.org/r/15653/diff/4/?file=395223#file395223line849
 
  Should this be ErrnoError?

getloadavg is most likely just a wrapper for sysctl (which sets errno), so yes 
- should probably be ErrnoError.
I will get this in the new patch.


- Niklas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15653/#review30045
---


On Dec. 6, 2013, 11:22 p.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15653/
 ---
 
 (Updated Dec. 6, 2013, 11:22 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 This patch includes a wrapper to get system load averages in uptime(1)
 format. This is used by an upcoming patch which expose these averages
 over master and slave stats.json endpoints.
 
 
 Diffs
 -
 
   3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp 544cf8c 
 
 Diff: https://reviews.apache.org/r/15653/diff/
 
 
 Testing
 ---
 
 make check and functional testing with endpoints.
 
 
 Thanks,
 
 Niklas Nielsen
 




Re: Review Request 16111: Fixed the zookeeper client wrappers to use the negotiated session timeout value as their local reconnect timeout.

2013-12-09 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16111/#review30064
---



src/zookeeper/group.hpp
https://reviews.apache.org/r/16111/#comment57595

Rather than passing timeout to all of these functions, let's just add a 
function to our ZooKeeper class.


- Benjamin Hindman


On Dec. 9, 2013, 6:51 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16111/
 ---
 
 (Updated Dec. 9, 2013, 6:51 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, Raul Gutierrez 
 Segales, and Vinod Kone.
 
 
 Bugs: MESOS-868
 https://issues.apache.org/jira/browse/MESOS-868
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/jvm/org/apache/zookeeper.hpp dac14565a66397f153cdc059859286f8ac555919 
   src/state/zookeeper.hpp 90b660737f52df426759877feb979b14ac4b6811 
   src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
   src/tests/zookeeper.hpp 1bc38c291cef39a4d255fd9065428a26d86248cb 
   src/tests/zookeeper.cpp 8bb49012d8dc46ef9f5a64ead1654253b9df8c21 
   src/tests/zookeeper_test_server.hpp 
 97a8524600fe3a57cf084c0dea8e99e9a056c504 
   src/tests/zookeeper_test_server.cpp 
 dc53d6a182a861544d2f9e7fa873be4c8c402856 
   src/tests/zookeeper_tests.cpp a5fe9e18fbaa88ea56662dc1b2e3d51fb0b50822 
   src/zookeeper/group.hpp facfb1fe31eeeb042c0e2b94d739101911620cdf 
   src/zookeeper/group.cpp 5c92c5f89d441b2555d928772fa40573660e3e5a 
   src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
   src/zookeeper/zookeeper.hpp 72435432e433fc0162f8b88e2045efcc42793a3a 
   src/zookeeper/zookeeper.cpp cc8a7caeedb2c109d4952a6520cc98565adaa700 
 
 Diff: https://reviews.apache.org/r/16111/diff/
 
 
 Testing
 ---
 
 ./bin/mesos-tests.sh 
 --gtest_filter=GroupTest*:ZooKeeperTest*:ZooKeeperMasterContenderDetectorTest*:ZooKeeperStateTest*
  -j --gtest_repeat=100 --gtest_break_on_failure --gtest_shuffle
 
 
 Thanks,
 
 Jiang Yan Xu
 




Re: Review Request 16136: Fixed the python tests in the presence of muliple eggs.

2013-12-09 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16136/
---

(Updated Dec. 10, 2013, 1:04 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
---

This now uses the existing egg postfix variables.


Summary (updated)
-

Fixed the python tests in the presence of muliple eggs.


Repository: mesos-git


Description
---

The python framework tests fail when there are multiple eggs present. This is 
because the scripts use overly general wildcards to match the egg files.

Multiple eggs can be present due to multiple mesos versions, as well as 
multiple python versions having ran through make check.


Diffs (updated)
-

  src/examples/python/test-executor.in 6f18682425b472b58fe4f42859d84cbb24da9f7c 
  src/examples/python/test-framework.in 
d66cf6bd6f6ecdfc917d2ac004cf32e61ce150c5 

Diff: https://reviews.apache.org/r/16136/diff/


Testing
---

make check on both OSX and CentOS 5.


Thanks,

Ben Mahler



Re: Review Request 16136: Fixed the python tests in the presence of muliple eggs.

2013-12-09 Thread Ben Mahler


 On Dec. 9, 2013, 10:22 p.m., Benjamin Hindman wrote:
  src/examples/python/test-executor.in, line 16
  https://reviews.apache.org/r/16136/diff/1/?file=395813#file395813line16
 
  Can we set PYTHON_VERSION in our automake variables and then use that 
  instead of EGG_PYTHON_VERSION?

Turns out we already have the postfix variables! Used those instead.


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16136/#review30053
---


On Dec. 10, 2013, 1:04 a.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16136/
 ---
 
 (Updated Dec. 10, 2013, 1:04 a.m.)
 
 
 Review request for mesos, Benjamin Hindman and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 The python framework tests fail when there are multiple eggs present. This is 
 because the scripts use overly general wildcards to match the egg files.
 
 Multiple eggs can be present due to multiple mesos versions, as well as 
 multiple python versions having ran through make check.
 
 
 Diffs
 -
 
   src/examples/python/test-executor.in 
 6f18682425b472b58fe4f42859d84cbb24da9f7c 
   src/examples/python/test-framework.in 
 d66cf6bd6f6ecdfc917d2ac004cf32e61ce150c5 
 
 Diff: https://reviews.apache.org/r/16136/diff/
 
 
 Testing
 ---
 
 make check on both OSX and CentOS 5.
 
 
 Thanks,
 
 Ben Mahler
 




Re: Review Request 15653: Adds loadavg() convenience method to stout

2013-12-09 Thread Niklas Nielsen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15653/
---

(Updated Dec. 10, 2013, 1:27 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
---

Using ErrnoError().


Repository: mesos-git


Description
---

This patch includes a wrapper to get system load averages in uptime(1)
format. This is used by an upcoming patch which expose these averages
over master and slave stats.json endpoints.


Diffs (updated)
-

  3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp 544cf8c 

Diff: https://reviews.apache.org/r/15653/diff/


Testing
---

make check and functional testing with endpoints.


Thanks,

Niklas Nielsen



Re: Review Request 14669: launchTasks on list of offers

2013-12-09 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14669/#review30069
---


Hey Nik, a few points below:

1. My only significant comment in this review is that launchTasks is perhaps 
more complicated than it needs to be, in that it is performing validation that 
could be delegated to offer validators that you've added here. This will remove 
the additional code sending TASK_LOST as well as the explicit use of an 
OfferError. Let me know what you think, I asked benh to take a look at this 
change as well to get some more opinions here.

2. Can you add a fix version for 0.17.0 on MESOS-749?

3. I would love to see a part 2 for this change where the java / python / C++ 
test frameworks use the new API call. This will ensure our language bindings 
work for the new call.


src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57601

Can this be const?

Can framework and slave be const references?



src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57603

Looks like ::some is not needed given the implicit constructor for option. 
Not yours, but seems like a good time to clean this up given we've introduced 
other visitors.

Ditto below.



src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57602

s/TaskInfoError::none()/None()/? (Not yours, but good time for a cleanup).

Ditto below.



src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57605

Is this constructor needed?



src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57610

getOffer should not be returning the offer if the slave was disconnected, 
see Master::exited

This can be CHECK(!slave.disconnected), is validation an effort to be 
operationally safer than a CHECK?



src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57608

Should this be printing offerId? Or perhaps conditionally printing:

 stringify(offerIds.empty() ? offerId : stringify(offerIds))



src/master/master.cpp
https://reviews.apache.org/r/14669/#comment57611

Looks like this case could be an OfferError that gets verified using an 
OfferVisitor.

If we pass a pointer to the Master (see Slave::Framework / Slave::Executor 
in slave.cpp), then we can have the OfferVisitors enforce this case here (no 
offers), as well as the cases below (offer is no longer valid, and offer 
outlived slave).

After validation, the master code here would be able to assume the request 
is valid, thus moving the validation details outside of Master::launchTasks.

Does this seem workable? It would be nice to simplify launchTasks in favor 
of making better validators, thoughts?


- Ben Mahler


On Dec. 2, 2013, 7:34 p.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14669/
 ---
 
 (Updated Dec. 2, 2013, 7:34 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
 
 
 Bugs: MESOS-749
 https://issues.apache.org/jira/browse/MESOS-749
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Running tasks on more than one offer belonging to a single slave can be 
 useful in situations with multiple out-standing offers.
 
 This patch extends the usual launchTasks() to accept a vector of OfferIDs. 
 The previous launchTasks (accepting a single OfferID) has been kept for 
 backward compatibility, but this now calls the new launchTasks() with a 
 one-element list.
 This also applied for the JNI and python interfaces, which accepts both 
 formats as well.
 
 Offers are verified to belong to the same slave and framework, before 
 resources are merged and used.
 
 
 Diffs
 -
 
   include/mesos/scheduler.hpp 161cc65 
   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 
   src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 
   src/java/src/org/apache/mesos/SchedulerDriver.java 5b0ca39 
   src/master/master.hpp a7bf963 
   src/master/master.cpp 4f4db93 
   src/messages/messages.proto 1f264d5 
   src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d 
   src/sched/sched.cpp b958435 
   src/tests/master_tests.cpp d34450b 
   src/tests/resource_offers_tests.cpp 9beb949 
 
 Diff: https://reviews.apache.org/r/14669/diff/
 
 
 Testing
 ---
 
 Three new tests has been added: LaunchCombinedOfferTest, 
 LaunchAcrossSlavesTest and LaunchDuplicateOfferTest
 This test ensures that:
 1) Multiple offers can be used to run a single task (requesting the sum of 
 offer resources).
 2) Offers cannot span multiple slaves.
 3) No offers can appear more than once in offer list.
 
 $ make check
 ...
 [ RUN  ] MasterTest.LaunchCombinedOfferTest
 [   OK ] 

Re: Review Request 16147: Containerizer

2013-12-09 Thread Ian Downes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16147/
---

(Updated Dec. 10, 2013, 2:21 a.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Chi Zhang, Niklas 
Nielsen, samya, and Jason Dusek.


Repository: mesos-git


Description
---

The proposed Containerizer interface is to replace the existing Isolator. 

One ContainerizerProcess has been written:
MesosContainerizerProcess - implements containerizeration internally using a 
Launcher and one or more Isolators (following review)

The intent is to also support a generic ExternalContainerizerProcess that can 
delegate containerizeration by making external calls. Other Containerizers 
could interface with specific external containerization techniques such as 
Docker or LXC.


Diffs
-

  src/slave/container/containerizer.hpp PRE-CREATION 
  src/slave/container/containerizer.cpp PRE-CREATION 
  src/slave/container/mesos_containerizer.hpp PRE-CREATION 
  src/slave/container/mesos_containerizer.cpp PRE-CREATION 
  src/slave/slave.hpp 2d093a3 
  src/slave/slave.cpp 91afe03 
  src/tests/test_containerizer.hpp PRE-CREATION 

Diff: https://reviews.apache.org/r/16147/diff/


Testing
---


Thanks,

Ian Downes



Review Request 16149: Containerizer - launchers

2013-12-09 Thread Ian Downes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16149/
---

Review request for mesos, Benjamin Hindman, Ben Mahler, Chi Zhang, Niklas 
Nielsen, samya, and Jason Dusek.


Repository: mesos-git


Description
---

Launcher interface and MesosLauncher to support MesosContainerizers.

Launchers handle the lifecycle of the executor process (and descendants).


Diffs
-

  src/slave/container/launcher.hpp PRE-CREATION 
  src/slave/container/launcher.cpp PRE-CREATION 
  src/slave/container/mesos_launcher.hpp PRE-CREATION 
  src/slave/container/mesos_launcher.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/16149/diff/


Testing
---


Thanks,

Ian Downes



Review Request 16150: Containerizer - isolators

2013-12-09 Thread Ian Downes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16150/
---

Review request for mesos, Benjamin Hindman, Ben Mahler, Chi Zhang, Niklas 
Nielsen, samya, and Jason Dusek.


Repository: mesos-git


Description
---

Isolators perform isolator for the MesosContainerizer

Isolator interface and implementations of Posix CPU and Mem isolators (no 
isolation, just usage())


Diffs
-

  src/slave/container/isolator.hpp PRE-CREATION 
  src/slave/container/isolator.cpp PRE-CREATION 
  src/slave/container/isolators/cpu/posix.hpp PRE-CREATION 
  src/slave/container/isolators/cpu/posix.cpp PRE-CREATION 
  src/slave/container/isolators/mem/posix.hpp PRE-CREATION 
  src/slave/container/isolators/mem/posix.cpp PRE-CREATION 
  src/slave/container/isolators/posix.hpp PRE-CREATION 

Diff: https://reviews.apache.org/r/16150/diff/


Testing
---


Thanks,

Ian Downes



[jira] [Commented] (MESOS-831) script-without-shebang

2013-12-09 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843877#comment-13843877
 ] 

Timothy St. Clair commented on MESOS-831:
-

https://reviews.apache.org/r/15764/

 script-without-shebang
 --

 Key: MESOS-831
 URL: https://issues.apache.org/jira/browse/MESOS-831
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.16.0
Reporter: Timothy St. Clair
 Attachments: MESOS-831.patch


 mesos.x86_64: E: script-without-shebang 
 /usr/libexec/mesos/python/mesos/__init__.py
 mesos.x86_64: E: script-without-shebang 
 /usr/libexec/mesos/python/mesos/http.py
 mesos.x86_64: E: script-without-shebang /usr/libexec/mesos/python/mesos/cli.py
 mesos.x86_64: E: script-without-shebang 
 /usr/libexec/mesos/python/mesos/futures.py



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MESOS-750) Require compilers that support c++11

2013-12-09 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843913#comment-13843913
 ] 

Till Toenshoff commented on MESOS-750:
--

During phase 2-4, should we add something that checks for C++11 compiler 
features availability (for autoconf that could be 
AX_CXX_COMPILE_STDCXX_11([noext], [optional])) for a given --with-cpp11=yes 
or maybe add a third option for --with-cpp11 (yes/no/check) which does that 
test? 


 Require compilers that support c++11
 

 Key: MESOS-750
 URL: https://issues.apache.org/jira/browse/MESOS-750
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Mahler
  Labels: technical_debt
 Fix For: 0.17.0


 Requiring C++11 support will provide substantial benefits to Mesos.
 Most notably, the lack of lambda support has resulted in a proliferation of 
 continuation style functions scattered throughout the code. Having lambdas 
 will allow us to reduce this clutter and simplify the code.
 This will require carefully documenting how to get Mesos compiling on various 
 systems to make this transition easy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 16136: Fixed the python tests in the presence of muliple eggs.

2013-12-09 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16136/#review30078
---

Ship it!



src/examples/python/test-executor.in
https://reviews.apache.org/r/16136/#comment57630

Is there any precedent for wrapping lines like this?


- Benjamin Hindman


On Dec. 10, 2013, 1:04 a.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16136/
 ---
 
 (Updated Dec. 10, 2013, 1:04 a.m.)
 
 
 Review request for mesos, Benjamin Hindman and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 The python framework tests fail when there are multiple eggs present. This is 
 because the scripts use overly general wildcards to match the egg files.
 
 Multiple eggs can be present due to multiple mesos versions, as well as 
 multiple python versions having ran through make check.
 
 
 Diffs
 -
 
   src/examples/python/test-executor.in 
 6f18682425b472b58fe4f42859d84cbb24da9f7c 
   src/examples/python/test-framework.in 
 d66cf6bd6f6ecdfc917d2ac004cf32e61ce150c5 
 
 Diff: https://reviews.apache.org/r/16136/diff/
 
 
 Testing
 ---
 
 make check on both OSX and CentOS 5.
 
 
 Thanks,
 
 Ben Mahler
 




[jira] [Commented] (MESOS-750) Require compilers that support c++11

2013-12-09 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843989#comment-13843989
 ] 

Benjamin Hindman commented on MESOS-750:


SGTM.

 Require compilers that support c++11
 

 Key: MESOS-750
 URL: https://issues.apache.org/jira/browse/MESOS-750
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Mahler
  Labels: technical_debt
 Fix For: 0.17.0


 Requiring C++11 support will provide substantial benefits to Mesos.
 Most notably, the lack of lambda support has resulted in a proliferation of 
 continuation style functions scattered throughout the code. Having lambdas 
 will allow us to reduce this clutter and simplify the code.
 This will require carefully documenting how to get Mesos compiling on various 
 systems to make this transition easy.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MESOS-749) Add support for multiple offers in launchTasks

2013-12-09 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-749:
-

Fix Version/s: 0.17.0

 Add support for multiple offers in launchTasks
 --

 Key: MESOS-749
 URL: https://issues.apache.org/jira/browse/MESOS-749
 Project: Mesos
  Issue Type: Improvement
Reporter: Niklas Quarfot Nielsen
Assignee: Niklas Quarfot Nielsen
 Fix For: 0.17.0


 Running tasks on more than one offer (which belong to a single slave) can be 
 useful in situations with multiple out-standing offers. Currently, only one 
 offer can be used per launch.
 Offer resources can be aggregated and used for traditional task launch.
 Feature involves:
 - Extending the scheduler API with launchTasks(offers, tasks, filters) with 
 takes a list of offers and opposed to a single offer.
 - Extending LaunchTasksMessage to carry offer list.
 - Extend the offer to offer list in call-path from scheduler to master.
 - Master applies offer visitors to validate and aggregate offers into a 
 single resource, before task validation and launch is carried out.
 Java and Python interfaces should support both the new and old launchTasks() 
 for backward compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

2013-12-09 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
---



src/log/log.hpp
https://reviews.apache.org/r/15802/#comment57464

Why s/Timeout/Duration/?



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57466

Pass this into LogProcess::watch instead.



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57644

Rather than holding on to the LogProcess*, how about we make Log::recover() 
return a FutureSharedReplica and that's how we get the replica. That way, 
instead of doing process-replica- we can just have a SharedReplica. Then 
LogProcess::finalize doesn't need to wait for Readers as your comment 
suggests above (or writers, as we'll see below). And does LogProcess::finalize 
really need to wait for the SharedReplica to be owned?



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57645

How about we just grab 'quorum' and 'network' like above get the 
SharedReplica from Log::recover(). Again, this removes our dependency on 
LogProcess*. I can imagine the LogReaderProcess and LogWriterProcess 
constructors initiate recovery on Log::recover() and save the returned future 
and use that to gate anything else (and obviously the future must have been 
satisfied to use 'replica' since that's how we get it).



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57465

s/(UPID)replica/(UPID) replica/



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57641

Some comments on why you need to wait for these would be great.



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57642

When would we want '!strict' with the log? I see you commented on why we 
might not want strict for the replica when writing tests, but if we are 
creating a log, won't we always want to recover?



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57467

How about a CHECK(replica.unique())? Eventually this is what release is 
for correct? If yes, maybe a TODO?



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57468

I'd prefer to keep these timeouts be of type Timeout not Duration (in all 
these methods).



src/log/log.cpp
https://reviews.apache.org/r/15802/#comment57469

You can just do 'future.await(timeout.remaining())' here.



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57488

For a future that we don't control we should transition our code to be more 
robust and move to a model where we don't expect them _not_ to be discarded. 
Instead, let's do something like:

CHECK(!future.isPending());

if (!future.isReady()) {
  promise.fail(Failed to ...:  + future.isFailed() ? future.failure() : 
future discarded;
  ...;
}

See examples in master/master.cpp.



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57503

Let's add helpers for Metadata::Status in common/type_utils.hpp so you can 
just do:

LOG(INFO)  Received ...  response.status()   status;

This will also enable us to do: stringify(response.status())



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57501

How about just std::min? I think it will read better:

if (lowestBeginPosition.isNone()) {
  lowestBeginPosition = response.begin();
}

lowestBeginPosition = std::min(lowestBeginPosition, response.begin());

Also, what about defining a helper for doing min with options:

template typename T
OptionT min(const OptionT left, const OptionT right)
{
  if (left.isSome()  right.isSome()) {
return std::min(left.get(), right.get());
  } else if (left.isSome()) {
return left.get();
  } else if (right.isSome()) {
return right.get();
  }
  return None();
}

This will make the code just:

lowestBeginPosition = min(lowestBeginPosition, response.begin());



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57502

How about just std::max?



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57500

This switch needs a lot more explanation. In particular, how do we know 
we've covered all of the cases? For example, what if we get a quorum of voting 
and empty? Why don't we do anything in that case?



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57496

s/re-calculate/recalculate/



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57497

s/these/this/



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57492

s/re-gained/regained/



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57494

s/Trying to re-gain/Try to regain/



src/log/recover.cpp
https://reviews.apache.org/r/15802/#comment57495

Why the delay?



src/log/replica.cpp

Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME #1792

2013-12-09 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/1792/

--
[...truncated 262 lines...]
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by an SCM change
Started by 

Re: Review Request 16111: Fixed the zookeeper client wrappers to use the negotiated session timeout value as their local reconnect timeout.

2013-12-09 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16111/
---

(Updated Dec. 9, 2013, 6:51 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, Raul Gutierrez Segales, 
and Vinod Kone.


Changes
---

reviewer +rgs


Bugs: MESOS-868
https://issues.apache.org/jira/browse/MESOS-868


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/jvm/org/apache/zookeeper.hpp dac14565a66397f153cdc059859286f8ac555919 
  src/state/zookeeper.hpp 90b660737f52df426759877feb979b14ac4b6811 
  src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
  src/tests/zookeeper.hpp 1bc38c291cef39a4d255fd9065428a26d86248cb 
  src/tests/zookeeper.cpp 8bb49012d8dc46ef9f5a64ead1654253b9df8c21 
  src/tests/zookeeper_test_server.hpp 97a8524600fe3a57cf084c0dea8e99e9a056c504 
  src/tests/zookeeper_test_server.cpp dc53d6a182a861544d2f9e7fa873be4c8c402856 
  src/tests/zookeeper_tests.cpp a5fe9e18fbaa88ea56662dc1b2e3d51fb0b50822 
  src/zookeeper/group.hpp facfb1fe31eeeb042c0e2b94d739101911620cdf 
  src/zookeeper/group.cpp 5c92c5f89d441b2555d928772fa40573660e3e5a 
  src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
  src/zookeeper/zookeeper.hpp 72435432e433fc0162f8b88e2045efcc42793a3a 
  src/zookeeper/zookeeper.cpp cc8a7caeedb2c109d4952a6520cc98565adaa700 

Diff: https://reviews.apache.org/r/16111/diff/


Testing
---

./bin/mesos-tests.sh 
--gtest_filter=GroupTest*:ZooKeeperTest*:ZooKeeperMasterContenderDetectorTest*:ZooKeeperStateTest*
 -j --gtest_repeat=100 --gtest_break_on_failure --gtest_shuffle


Thanks,

Jiang Yan Xu



Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME #1519

2013-12-09 Thread Benjamin Mahler
Looks like a bug during tear down:

python_framework_test.sh terminated with signal 'Segmentation fault'

Will see if I can reproduce locally.


On Fri, Dec 6, 2013 at 11:11 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/1519/changes
 

 Changes:

 [bmahler] Fixed the master to drop authentication when non-leading.

 [bmahler] Fixed the flaky FaultTolerance.SlaveReliableRegistration test.

 --
 [...truncated 23251 lines...]
 2013-12-07 07:11:49,159:22384(0x2b1f0c200700):ZOO_INFO@check_events@1632:
 session establishment complete on server [127.0.0.1:51092],
 sessionId=0x142cbe5c7d70009, negotiated timeout=1
 I1207 07:11:49.159778 22431 group.cpp:280] Group process ((1533)@
 140.211.11.27:50484) connected to ZooKeeper
 I1207 07:11:49.159797 22431 group.cpp:675] Syncing group operations: queue
 size (joins, cancels, datas) = (0, 0, 0)
 I1207 07:11:49.159806 22431 group.cpp:337] Trying to create path '/znode'
 in ZooKeeper
 I1207 07:11:49.161938 22417 contender.cpp:203] New candidate (id='4',
 data='master@140.211.11.27:50484') has entered the contest for leadership
 I1207 07:11:49.162163 22422 detector.cpp:130] Detected a new leader
 (id='4')
 I1207 07:11:49.162396 22432 group.cpp:562] Trying to get
 '/znode/04' in ZooKeeper
 I1207 07:11:49.162595 22411 detector.cpp:130] Detected a new leader
 (id='4')
 I1207 07:11:49.162675 22430 detector.cpp:130] Detected a new leader
 (id='4')
 I1207 07:11:49.162806 22423 group.cpp:562] Trying to get
 '/znode/04' in ZooKeeper
 I1207 07:11:49.162816 22421 group.cpp:562] Trying to get
 '/znode/04' in ZooKeeper
 I1207 07:11:49.163336 22419 detector.cpp:322] A new leading master (UPID=
 master@140.211.11.27:50484) is detected
 I1207 07:11:49.163419 22416 master.cpp:746] The newly elected leader is
 master@140.211.11.27:50484
 I1207 07:11:49.163442 22416 master.cpp:750] Elected as the leading master!
 I1207 07:11:49.163710 22416 detector.cpp:322] A new leading master (UPID=
 master@140.211.11.27:50484) is detected
 I1207 07:11:49.163846 22433 slave.cpp:497] New master detected at
 master@140.211.11.27:50484
 I1207 07:11:49.163951 22418 status_update_manager.cpp:160] New master
 detected at master@140.211.11.27:50484
 I1207 07:11:49.163983 22433 slave.cpp:524] Detecting new master
 I1207 07:11:49.164011 22424 detector.cpp:322] A new leading master (UPID=
 master@140.211.11.27:50484) is detected
 I1207 07:11:49.164017 22415 master.cpp:1366] Attempting to register slave
 on hemera.apache.org at slave(136)@140.211.11.27:50484
 I1207 07:11:49.164041 22415 master.cpp:2628] Adding slave
 201312070711-453759884-50484-22384-0 at hemera.apache.org with cpus(*):2;
 mem(*):1024; disk(*):127026; ports(*):[31000-32000]
 I1207 07:11:49.164105 22416 sched.cpp:207] New master detected at
 master@140.211.11.27:50484
 I1207 07:11:49.164139 22416 sched.cpp:260] Authenticating with master
 master@140.211.11.27:50484
 I1207 07:11:49.164165 22412 slave.cpp:542] Registered with master
 master@140.211.11.27:50484; given slave ID
 201312070711-453759884-50484-22384-0
 I1207 07:11:49.164340 22416 sched.cpp:229] Detecting new master
 I1207 07:11:49.164353 22420 authenticatee.hpp:124] Creating new client
 SASL connection
 I1207 07:11:49.164374 22427 hierarchical_allocator_process.hpp:445] Added
 slave 201312070711-453759884-50484-22384-0 (hemera.apache.org) with
 cpus(*):2; mem(*):1024; disk(*):127026; ports(*):[31000-32000] (and
 cpus(*):2; mem(*):1024; disk(*):127026; ports(*):[31000-32000] available)
 I1207 07:11:49.164453 22427 hierarchical_allocator_process.hpp:708]
 Performed allocation for slave 201312070711-453759884-50484-22384-0 in
 10308ns
 I1207 07:11:49.164623 22428 master.cpp:1849] Authenticating framework at
 scheduler(131)@140.211.11.27:50484
 I1207 07:11:49.164788 22424 authenticator.hpp:140] Creating new server
 SASL connection
 I1207 07:11:49.165066 22426 authenticatee.hpp:212] Received SASL
 authentication mechanisms: CRAM-MD5
 I1207 07:11:49.165088 22426 authenticatee.hpp:238] Attempting to
 authenticate with mechanism 'CRAM-MD5'
 I1207 07:11:49.165124 22431 authenticator.hpp:243] Received SASL
 authentication start
 I1207 07:11:49.165197 22431 authenticator.hpp:325] Authentication requires
 more steps
 I1207 07:11:49.165245 22429 authenticatee.hpp:258] Received SASL
 authentication step
 I1207 07:11:49.165319 22424 authenticator.hpp:271] Received SASL
 authentication step
 I1207 07:11:49.165348 22424 auxprop.cpp:81] Request to lookup properties
 for user: 'test-principal' realm: 'hemera.apache.org' server FQDN: '
 hemera.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
 SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
 I1207 07:11:49.165387 22424 auxprop.cpp:153] Looking up auxiliary property
 '*userPassword'
 I1207 07:11:49.165401 22424 auxprop.cpp:153] Looking up auxiliary property
 '*cmusaslsecretCRAM-MD5'
 

[jira] [Commented] (MESOS-672) Web UI redirection does not work for hosts whose ip addresses are not publicly accessible

2013-12-09 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843468#comment-13843468
 ] 

Benjamin Hindman commented on MESOS-672:


Hey [~xujyan], given the code today doing something like (3) and (4) are going 
to be difficult (or hacky) in order to cleanly be backwards compatibility. 
Here's another suggestion for (3) which could be used for (4) as well: add some 
data semantics into the LeaderContender in zookeeper/contender.hpp rather 
than just passing arbitrary data. For example, you could have a 
LeaderContender constructor which takes a map from key to value and 
construct multiple znodes for each key. Then, to implement Brenden's approach 
you could pass a map with a 'hostname' key. The LeaderDetector and/or 
MasterDetector would likely still have some grossness to pull this data out, 
and the StandaloneMasterDetector might be a bit weird here.

 Web UI redirection does not work for hosts whose ip addresses are not 
 publicly accessible
 -

 Key: MESOS-672
 URL: https://issues.apache.org/jira/browse/MESOS-672
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Jie Yu
Assignee: Jie Yu
  Labels: twitter
 Fix For: 0.17.0


 Web UI redirection does not work for hosts where the local interface address 
 is not publicly accessible. For example, with EC2 the redirection will not 
 work.
 There are some possible solutions:
 (1) Add a new REST endpoint on the master called 'info'. When master A finds 
 out that master B is the leader it hits master B's '/master/info' endpoint to 
 get back information about that master including it's (public) hostname. 
 - This also requires making sure that each master uses it's public 
 hostname which may possibly require adding a --hostname flag (similar to what 
 we did on the slave). 
 - Alternatively, we could update os::hostname to special case EC2, thus 
 making Mesos work out of the box without requiring operators to explicitly 
 set it to the private hostname. 
 (2) Add a 'hostname' field to PID and make sure that stringification of the 
 PID uses the hostname. Then master redirection is done by getting the 
 hostname of the PID instead of the IP. Note this still requires detecting the 
 public hostname using mechanisms mentioned in (1). 
 (3) Store a separate ZNode for the public hostname. Patch from Brenden 
 Matthews: https://reviews.apache.org/r/11975/
 (4) Store a protobuf blob of 'MasterInfo' in ZooKeeper which includes the 
 hostname field (suggested by Vinod Kone in the above review). We have to deal 
 with issues with backwards compatibility. When old slaves read the new 
 master's data, it deserializes the protobuf blob as a PID; when new slaves 
 read the old master's data, it deserializes the PID as protobuf.
 This ticket intends to evaluate these potential solutions and solicit new 
 ideas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


mesos and ec2

2013-12-09 Thread Benjamin Hindman
If you don't use Mesos on EC2 then you can likely stop reading this email.

When using Mesos on EC2 you might notice that the web UI seems broken. The
two things we know about are:

(1) You can't use '/master/redirect' on a master because it attempts to
send you to the private IP hostname of the leading master (which obviously
your browser can't connect to).

(2) You can't get information from the slaves (and thus, information about
your tasks such as file download/tailing) because your browser tries to
connect to the slave's private hostname.

We're trying to architect a clean fix for (1) in
MESOS-672https://issues.apache.org/jira/browse/MESOS-672.
Feel free to follow along and please add any suggestions. In the mean time,
the best solution that we know of is to modify /etc/hosts *on the masters* to
resolve the public hostname from the private IP.

For (2) you can set the --hostname flag on the slave to the public
hostname. Within EC2 the public hostname will resolve to the private IP
address so you won't be paying extra traffic fees but your browser will
also be able to get the public IP address from the public hostname.

Of course, don't forget to set up your security groups to make sure the
master and slave port(s) are open (5050 and 5051 by default respectively).

We plan to make these fixes less of a manual process in the future and
would love to hear your suggestions!

Thanks,

Ben.


[jira] [Updated] (MESOS-672) Web UI redirection does not work for hosts whose ip addresses are not publicly accessible

2013-12-09 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-672:
-

Assignee: Yan Xu  (was: Jie Yu)

 Web UI redirection does not work for hosts whose ip addresses are not 
 publicly accessible
 -

 Key: MESOS-672
 URL: https://issues.apache.org/jira/browse/MESOS-672
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Jie Yu
Assignee: Yan Xu
  Labels: twitter
 Fix For: 0.17.0


 Web UI redirection does not work for hosts where the local interface address 
 is not publicly accessible. For example, with EC2 the redirection will not 
 work.
 There are some possible solutions:
 (1) Add a new REST endpoint on the master called 'info'. When master A finds 
 out that master B is the leader it hits master B's '/master/info' endpoint to 
 get back information about that master including it's (public) hostname. 
 - This also requires making sure that each master uses it's public 
 hostname which may possibly require adding a --hostname flag (similar to what 
 we did on the slave). 
 - Alternatively, we could update os::hostname to special case EC2, thus 
 making Mesos work out of the box without requiring operators to explicitly 
 set it to the private hostname. 
 (2) Add a 'hostname' field to PID and make sure that stringification of the 
 PID uses the hostname. Then master redirection is done by getting the 
 hostname of the PID instead of the IP. Note this still requires detecting the 
 public hostname using mechanisms mentioned in (1). 
 (3) Store a separate ZNode for the public hostname. Patch from Brenden 
 Matthews: https://reviews.apache.org/r/11975/
 (4) Store a protobuf blob of 'MasterInfo' in ZooKeeper which includes the 
 hostname field (suggested by Vinod Kone in the above review). We have to deal 
 with issues with backwards compatibility. When old slaves read the new 
 master's data, it deserializes the protobuf blob as a PID; when new slaves 
 read the old master's data, it deserializes the PID as protobuf.
 This ticket intends to evaluate these potential solutions and solicit new 
 ideas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: mesos and ec2

2013-12-09 Thread Tim St Clair
inline 

- Original Message -

 From: Benjamin Hindman benjamin.hind...@gmail.com
 To: dev dev@mesos.apache.org, u...@mesos.apache.org
 Sent: Monday, December 9, 2013 2:06:36 PM
 Subject: mesos and ec2

 If you don't use Mesos on EC2 then you can likely stop reading this email.

 When using Mesos on EC2 you might notice that the web UI seems broken. The
 two things we know about are:

 (1) You can't use '/master/redirect' on a master because it attempts to send
 you to the private IP hostname of the leading master (which obviously your
 browser can't connect to).

Why not use a simple detection mechanics: 

my_hostname=$(hostname -f) 

curl --connect-timeout 3 --silent http://169.254.169.254/2012-01-12 
if [ $? -eq 0 ]; then 
my_hostname=$(curl http://169.254.169.254/2012-01-12/meta-data/public-hostname) 
echo Found EC2 public hostname: $my_hostname 
fi 

 (2) You can't get information from the slaves (and thus, information about
 your tasks such as file download/tailing) because your browser tries to
 connect to the slave's private hostname.

same details as above. 

 We're trying to architect a clean fix for (1) in MESOS-672 . Feel free to
 follow along and please add any suggestions. In the mean time, the best
 solution that we know of is to modify /etc/hosts on the masters to resolve
 the public hostname from the private IP.

 For (2) you can set the --hostname flag on the slave to the public hostname.
 Within EC2 the public hostname will resolve to the private IP address so you
 won't be paying extra traffic fees but your browser will also be able to get
 the public IP address from the public hostname.

 Of course, don't forget to set up your security groups to make sure the
 master and slave port(s) are open (5050 and 5051 by default respectively).

 We plan to make these fixes less of a manual process in the future and would
 love to hear your suggestions!

 Thanks,

 Ben.

-- 
Cheers, 
Tim 


Re: mesos and ec2

2013-12-09 Thread Tim St Clair
full ref: 

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instancedata.html
 

- Original Message -

 From: Tim St Clair tstcl...@redhat.com
 To: u...@mesos.apache.org
 Cc: dev dev@mesos.apache.org
 Sent: Monday, December 9, 2013 2:46:46 PM
 Subject: Re: mesos and ec2

 inline

 - Original Message -

  From: Benjamin Hindman benjamin.hind...@gmail.com
 
  To: dev dev@mesos.apache.org, u...@mesos.apache.org
 
  Sent: Monday, December 9, 2013 2:06:36 PM
 
  Subject: mesos and ec2
 

  If you don't use Mesos on EC2 then you can likely stop reading this email.
 

  When using Mesos on EC2 you might notice that the web UI seems broken. The
  two things we know about are:
 

  (1) You can't use '/master/redirect' on a master because it attempts to
  send
  you to the private IP hostname of the leading master (which obviously your
  browser can't connect to).
 

 Why not use a simple detection mechanics:

 my_hostname=$(hostname -f)

 curl --connect-timeout 3 --silent http://169.254.169.254/2012-01-12
 if [ $? -eq 0 ]; then
 my_hostname=$(curl
 http://169.254.169.254/2012-01-12/meta-data/public-hostname)
 echo Found EC2 public hostname: $my_hostname
 fi

  (2) You can't get information from the slaves (and thus, information about
  your tasks such as file download/tailing) because your browser tries to
  connect to the slave's private hostname.
 

 same details as above.

  We're trying to architect a clean fix for (1) in MESOS-672 . Feel free to
  follow along and please add any suggestions. In the mean time, the best
  solution that we know of is to modify /etc/hosts on the masters to resolve
  the public hostname from the private IP.
 

  For (2) you can set the --hostname flag on the slave to the public
  hostname.
  Within EC2 the public hostname will resolve to the private IP address so
  you
  won't be paying extra traffic fees but your browser will also be able to
  get
  the public IP address from the public hostname.
 

  Of course, don't forget to set up your security groups to make sure the
  master and slave port(s) are open (5050 and 5051 by default respectively).
 

  We plan to make these fixes less of a manual process in the future and
  would
  love to hear your suggestions!
 

  Thanks,
 

  Ben.
 

 --
 Cheers,
 Tim

-- 
Cheers, 
Tim 


Re: Review Request 15764: Minor fix for script-without-shebang MESOS-831

2013-12-09 Thread Ben Mahler


 On Nov. 22, 2013, 9:51 p.m., Ben Mahler wrote:
  Interesting, these files are not meant to be scripts (they are python 
  libraries), which tool reported the script-without-shebang issue?
  
  (Can you kill the trailing whitespace in the Apache header?)
 
 Timothy St. Clair wrote:
 rpmlint generated on recent update 
 (https://bugzilla.redhat.com/show_bug.cgi?id=1010512#c6).  It checks the 
 contents of the install targets.
 
 Timothy St. Clair wrote:
 Updated patch to remove minor trailing spaces.  There were only a couple.

Did you forget to update the patch?


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15764/#review29316
---


On Nov. 21, 2013, 7:07 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15764/
 ---
 
 (Updated Nov. 21, 2013, 7:07 p.m.)
 
 
 Review request for mesos.
 
 
 Bugs: MESOS-831
 https://issues.apache.org/jira/browse/MESOS-831
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Minor modification to python scripts. 
 
 
 Diffs
 -
 
   src/cli/python/mesos/__init__.py e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 
   src/cli/python/mesos/cli.py 5c11d4664de4314674cb49cf534e334a0663b254 
   src/cli/python/mesos/futures.py be374cf0235de75f7201b3e8078af8cef620dc2f 
   src/cli/python/mesos/http.py 9db9e23c731c4306ff3866c07963a19c196f758c 
 
 Diff: https://reviews.apache.org/r/15764/diff/
 
 
 Testing
 ---
 
 n/a
 
 
 Thanks,
 
 Timothy St. Clair
 




Re: mesos and ec2

2013-12-09 Thread Brenden Matthews
My preferred solution is to:

   1. Remove the requirement that the forward and reverse DNS match (
   
https://github.com/airbnb/mesos/commit/6953571758d287d159a289e62c1fd0d685b5e8e3),
   and
   2. Write the leading master's hostname to ZooKeeper for discovery (
   
https://github.com/airbnb/mesos/commit/3f6854c40058d8542d38c9c9a67bb5118b008552
   ).

To me this seems like the least troublesome approach.  Furthermore, it's
matches the behaviour of several other projects such as Hadoop (which also
uses the machine's hostname from `gethostname(2)` or equivalent).


On Mon, Dec 9, 2013 at 12:50 PM, Tim St Clair tstcl...@redhat.com wrote:

 full ref:


 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instancedata.html

 - Original Message -

  From: Tim St Clair tstcl...@redhat.com
  To: u...@mesos.apache.org
  Cc: dev dev@mesos.apache.org
  Sent: Monday, December 9, 2013 2:46:46 PM
  Subject: Re: mesos and ec2

  inline

  - Original Message -

   From: Benjamin Hindman benjamin.hind...@gmail.com
 
   To: dev dev@mesos.apache.org, u...@mesos.apache.org
 
   Sent: Monday, December 9, 2013 2:06:36 PM
 
   Subject: mesos and ec2
 

   If you don't use Mesos on EC2 then you can likely stop reading this
 email.
 

   When using Mesos on EC2 you might notice that the web UI seems broken.
 The
   two things we know about are:
 

   (1) You can't use '/master/redirect' on a master because it attempts to
   send
   you to the private IP hostname of the leading master (which obviously
 your
   browser can't connect to).
 

  Why not use a simple detection mechanics:

  my_hostname=$(hostname -f)

  curl --connect-timeout 3 --silent http://169.254.169.254/2012-01-12
  if [ $? -eq 0 ]; then
  my_hostname=$(curl
  http://169.254.169.254/2012-01-12/meta-data/public-hostname)
  echo Found EC2 public hostname: $my_hostname
  fi

   (2) You can't get information from the slaves (and thus, information
 about
   your tasks such as file download/tailing) because your browser tries to
   connect to the slave's private hostname.
 

  same details as above.

   We're trying to architect a clean fix for (1) in MESOS-672 . Feel free
 to
   follow along and please add any suggestions. In the mean time, the best
   solution that we know of is to modify /etc/hosts on the masters to
 resolve
   the public hostname from the private IP.
 

   For (2) you can set the --hostname flag on the slave to the public
   hostname.
   Within EC2 the public hostname will resolve to the private IP address
 so
   you
   won't be paying extra traffic fees but your browser will also be able
 to
   get
   the public IP address from the public hostname.
 

   Of course, don't forget to set up your security groups to make sure the
   master and slave port(s) are open (5050 and 5051 by default
 respectively).
 

   We plan to make these fixes less of a manual process in the future and
   would
   love to hear your suggestions!
 

   Thanks,
 

   Ben.
 

  --
  Cheers,
  Tim

 --
 Cheers,
 Tim



Review Request 16136: Fixed the python tests in the presence of multiple eggs.

2013-12-09 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16136/
---

Review request for mesos, Benjamin Hindman and Vinod Kone.


Repository: mesos-git


Description
---

The python framework tests fail when there are multiple eggs present. This is 
because the scripts use overly general wildcards to match the egg files.

Multiple eggs can be present due to multiple mesos versions, as well as 
multiple python versions having ran through make check.


Diffs
-

  src/examples/python/test-executor.in 6f18682425b472b58fe4f42859d84cbb24da9f7c 
  src/examples/python/test-framework.in 
d66cf6bd6f6ecdfc917d2ac004cf32e61ce150c5 

Diff: https://reviews.apache.org/r/16136/diff/


Testing
---

make check on both OSX and CentOS 5.


Thanks,

Ben Mahler



Re: Review Request 15764: Minor fix for script-without-shebang MESOS-831

2013-12-09 Thread Timothy St. Clair


 On Nov. 22, 2013, 9:51 p.m., Ben Mahler wrote:
  Interesting, these files are not meant to be scripts (they are python 
  libraries), which tool reported the script-without-shebang issue?
  
  (Can you kill the trailing whitespace in the Apache header?)
 
 Timothy St. Clair wrote:
 rpmlint generated on recent update 
 (https://bugzilla.redhat.com/show_bug.cgi?id=1010512#c6).  It checks the 
 contents of the install targets.
 
 Timothy St. Clair wrote:
 Updated patch to remove minor trailing spaces.  There were only a couple.
 
 Ben Mahler wrote:
 Did you forget to update the patch?

This is my 1st time using the review board, I used Update Diff to update the 
patch.  When I download the diff it appears correct updated, with no extra 
lines or spaces.  Is there a different workflow that I should be following?  If 
so url please? 


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15764/#review29316
---


On Nov. 21, 2013, 7:07 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15764/
 ---
 
 (Updated Nov. 21, 2013, 7:07 p.m.)
 
 
 Review request for mesos.
 
 
 Bugs: MESOS-831
 https://issues.apache.org/jira/browse/MESOS-831
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Minor modification to python scripts. 
 
 
 Diffs
 -
 
   src/cli/python/mesos/__init__.py e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 
   src/cli/python/mesos/cli.py 5c11d4664de4314674cb49cf534e334a0663b254 
   src/cli/python/mesos/futures.py be374cf0235de75f7201b3e8078af8cef620dc2f 
   src/cli/python/mesos/http.py 9db9e23c731c4306ff3866c07963a19c196f758c 
 
 Diff: https://reviews.apache.org/r/15764/diff/
 
 
 Testing
 ---
 
 n/a
 
 
 Thanks,
 
 Timothy St. Clair
 




Re: Review Request 16136: Fixed the python tests in the presence of multiple eggs.

2013-12-09 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16136/#review30053
---



src/examples/python/test-executor.in
https://reviews.apache.org/r/16136/#comment57567

Can we set PYTHON_VERSION in our automake variables and then use that 
instead of EGG_PYTHON_VERSION?


- Benjamin Hindman


On Dec. 9, 2013, 9:49 p.m., Ben Mahler wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16136/
 ---
 
 (Updated Dec. 9, 2013, 9:49 p.m.)
 
 
 Review request for mesos, Benjamin Hindman and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 The python framework tests fail when there are multiple eggs present. This is 
 because the scripts use overly general wildcards to match the egg files.
 
 Multiple eggs can be present due to multiple mesos versions, as well as 
 multiple python versions having ran through make check.
 
 
 Diffs
 -
 
   src/examples/python/test-executor.in 
 6f18682425b472b58fe4f42859d84cbb24da9f7c 
   src/examples/python/test-framework.in 
 d66cf6bd6f6ecdfc917d2ac004cf32e61ce150c5 
 
 Diff: https://reviews.apache.org/r/16136/diff/
 
 
 Testing
 ---
 
 make check on both OSX and CentOS 5.
 
 
 Thanks,
 
 Ben Mahler
 




Re: Review Request 15764: Minor fix for script-without-shebang MESOS-831

2013-12-09 Thread Benjamin Mahler
Sorry for the trouble! I could submit this and clean up the whitespace for
you, but I think this will be a good exercise in using reviewboard. :)

Updating will, as far as I know, result in a new revision in the diff page,
but there is only one diff revision:
https://reviews.apache.org/r/15764/diff/#index_header

Looking at the diff page, there is only one revision, and it has the
trailing whitespace. The raw diff also has the trailing whitespace, so it
doesn't appear as if the diff has been updated:
https://reviews.apache.org/r/15764/diff/raw/

When you publish a new diff successfully, you will see a 'Review request
changed' update on the review, see https://reviews.apache.org/r/15708/ for
an example (after Vinod's review I uploaded a new diff).

On Mon, Dec 9, 2013 at 2:05 PM, Timothy St. Clair tstcl...@redhat.comwrote:



  On Nov. 22, 2013, 9:51 p.m., Ben Mahler wrote:
   Interesting, these files are not meant to be scripts (they are python
 libraries), which tool reported the script-without-shebang issue?
  
   (Can you kill the trailing whitespace in the Apache header?)
 
  Timothy St. Clair wrote:
  rpmlint generated on recent update (
 https://bugzilla.redhat.com/show_bug.cgi?id=1010512#c6).  It checks the
 contents of the install targets.
 
  Timothy St. Clair wrote:
  Updated patch to remove minor trailing spaces.  There were only a
 couple.
 
  Ben Mahler wrote:
  Did you forget to update the patch?

 This is my 1st time using the review board, I used Update Diff to update
 the patch.  When I download the diff it appears correct updated, with no
 extra lines or spaces.  Is there a different workflow that I should be
 following?  If so url please?


 - Timothy


 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15764/#review29316
 ---


 On Nov. 21, 2013, 7:07 p.m., Timothy St. Clair wrote:
 
  ---
  This is an automatically generated e-mail. To reply, visit:
  https://reviews.apache.org/r/15764/
  ---
 
  (Updated Nov. 21, 2013, 7:07 p.m.)
 
 
  Review request for mesos.
 
 
  Bugs: MESOS-831
  https://issues.apache.org/jira/browse/MESOS-831
 
 
  Repository: mesos-git
 
 
  Description
  ---
 
  Minor modification to python scripts.
 
 
  Diffs
  -
 
src/cli/python/mesos/__init__.py
 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
src/cli/python/mesos/cli.py 5c11d4664de4314674cb49cf534e334a0663b254
src/cli/python/mesos/futures.py
 be374cf0235de75f7201b3e8078af8cef620dc2f
src/cli/python/mesos/http.py 9db9e23c731c4306ff3866c07963a19c196f758c
 
  Diff: https://reviews.apache.org/r/15764/diff/
 
 
  Testing
  ---
 
  n/a
 
 
  Thanks,
 
  Timothy St. Clair
 
 




Re: Review Request 15764: Minor fix for script-without-shebang MESOS-831

2013-12-09 Thread Timothy St. Clair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15764/
---

(Updated Dec. 9, 2013, 10:40 p.m.)


Review request for mesos.


Changes
---

Publishing the second diff, apparently I did not publish changes


Bugs: MESOS-831
https://issues.apache.org/jira/browse/MESOS-831


Repository: mesos-git


Description
---

Minor modification to python scripts. 


Diffs (updated)
-

  src/cli/python/mesos/__init__.py e69de29 
  src/cli/python/mesos/cli.py f32ba49 
  src/cli/python/mesos/futures.py be374cf 
  src/cli/python/mesos/http.py 9db9e23 

Diff: https://reviews.apache.org/r/15764/diff/


Testing
---

n/a


Thanks,

Timothy St. Clair



Re: Review Request 16111: Fixed the zookeeper client wrappers to use the negotiated session timeout value as their local reconnect timeout.

2013-12-09 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16111/#review30048
---



src/state/zookeeper.cpp
https://reviews.apache.org/r/16111/#comment57576

Ditto about the subsequent ZK connection using this timeout as well, can 
you add a note if this was what you intended?

Why not just say:

// Update the session timeout to the negotiated value.

Here and elsewhere.



src/tests/zookeeper.hpp
https://reviews.apache.org/r/16111/#comment57577

timeout is only applicable for session connected events, so maybe this 
makes more sense as an OptionDuration with a comment that reflects when it is 
Some? (We could of course have different Events but let's keep this patch 
simple :)).

The code that assumes it to be some can do the appropriate CHECK.



src/tests/zookeeper_test_server.hpp
https://reviews.apache.org/r/16111/#comment57575

Should these be const?



src/tests/zookeeper_test_server.cpp
https://reviews.apache.org/r/16111/#comment57557

Perhaps a comment here as to how you knew that the int represented 
milliseconds in the ZK code?



src/tests/zookeeper_test_server.cpp
https://reviews.apache.org/r/16111/#comment57559

Is the cast needed on these two?



src/tests/zookeeper_tests.cpp
https://reviews.apache.org/r/16111/#comment57574

Thanks for the test!



src/zookeeper/group.cpp
https://reviews.apache.org/r/16111/#comment57573

This also results in us using the negotiated session timeout for the 
subsequent zookeeper connection in expired(), was that intended? If so, perhaps 
mention that here.



src/zookeeper/zookeeper.hpp
https://reviews.apache.org/r/16111/#comment57570

Maybe s/timeout/sessionTimeout/ here and elsewhere to be more clear.



src/zookeeper/zookeeper.cpp
https://reviews.apache.org/r/16111/#comment57571

Is this cast necessary? Looks like we're widening.


- Ben Mahler


On Dec. 9, 2013, 6:51 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16111/
 ---
 
 (Updated Dec. 9, 2013, 6:51 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, Raul Gutierrez 
 Segales, and Vinod Kone.
 
 
 Bugs: MESOS-868
 https://issues.apache.org/jira/browse/MESOS-868
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/jvm/org/apache/zookeeper.hpp dac14565a66397f153cdc059859286f8ac555919 
   src/state/zookeeper.hpp 90b660737f52df426759877feb979b14ac4b6811 
   src/state/zookeeper.cpp 09b63d44e9349cab2d73659c939de3d8e96fbcc5 
   src/tests/zookeeper.hpp 1bc38c291cef39a4d255fd9065428a26d86248cb 
   src/tests/zookeeper.cpp 8bb49012d8dc46ef9f5a64ead1654253b9df8c21 
   src/tests/zookeeper_test_server.hpp 
 97a8524600fe3a57cf084c0dea8e99e9a056c504 
   src/tests/zookeeper_test_server.cpp 
 dc53d6a182a861544d2f9e7fa873be4c8c402856 
   src/tests/zookeeper_tests.cpp a5fe9e18fbaa88ea56662dc1b2e3d51fb0b50822 
   src/zookeeper/group.hpp facfb1fe31eeeb042c0e2b94d739101911620cdf 
   src/zookeeper/group.cpp 5c92c5f89d441b2555d928772fa40573660e3e5a 
   src/zookeeper/watcher.hpp 1db0386719c2a675d29b47b417dc856993062326 
   src/zookeeper/zookeeper.hpp 72435432e433fc0162f8b88e2045efcc42793a3a 
   src/zookeeper/zookeeper.cpp cc8a7caeedb2c109d4952a6520cc98565adaa700 
 
 Diff: https://reviews.apache.org/r/16111/diff/
 
 
 Testing
 ---
 
 ./bin/mesos-tests.sh 
 --gtest_filter=GroupTest*:ZooKeeperTest*:ZooKeeperMasterContenderDetectorTest*:ZooKeeperStateTest*
  -j --gtest_repeat=100 --gtest_break_on_failure --gtest_shuffle
 
 
 Thanks,
 
 Jiang Yan Xu
 




Re: Review Request 15764: Minor fix for script-without-shebang MESOS-831

2013-12-09 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15764/#review30055
---

Ship it!


Ship It!

- Ben Mahler


On Dec. 9, 2013, 10:40 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15764/
 ---
 
 (Updated Dec. 9, 2013, 10:40 p.m.)
 
 
 Review request for mesos.
 
 
 Bugs: MESOS-831
 https://issues.apache.org/jira/browse/MESOS-831
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Minor modification to python scripts. 
 
 
 Diffs
 -
 
   src/cli/python/mesos/__init__.py e69de29 
   src/cli/python/mesos/cli.py f32ba49 
   src/cli/python/mesos/futures.py be374cf 
   src/cli/python/mesos/http.py 9db9e23 
 
 Diff: https://reviews.apache.org/r/15764/diff/
 
 
 Testing
 ---
 
 n/a
 
 
 Thanks,
 
 Timothy St. Clair
 




Re: Review Request 15764: Minor fix for script-without-shebang MESOS-831

2013-12-09 Thread Ben Mahler


 On Dec. 9, 2013, 10:47 p.m., Ben Mahler wrote:
  Ship It!

Thanks Tim, this is now committed, can you mark this review as submitted and 
update the JIRA ticket?


- Ben


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15764/#review30055
---


On Dec. 9, 2013, 10:40 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15764/
 ---
 
 (Updated Dec. 9, 2013, 10:40 p.m.)
 
 
 Review request for mesos.
 
 
 Bugs: MESOS-831
 https://issues.apache.org/jira/browse/MESOS-831
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Minor modification to python scripts. 
 
 
 Diffs
 -
 
   src/cli/python/mesos/__init__.py e69de29 
   src/cli/python/mesos/cli.py f32ba49 
   src/cli/python/mesos/futures.py be374cf 
   src/cli/python/mesos/http.py 9db9e23 
 
 Diff: https://reviews.apache.org/r/15764/diff/
 
 
 Testing
 ---
 
 n/a
 
 
 Thanks,
 
 Timothy St. Clair