> On 2012-03-24 23:51:02, Matei Zaharia wrote:
> > I really like the changes overall, but are you sure that removing code from 
> > the error callback is a good idea? If people want to respond to particular 
> > errors in particular ways, relying on a string to identify the error is 
> > flimsy, because we might change the wording in the string.
> > 
> > I guess it also depends on the semantics of error(). If we only use it for 
> > "fatal" errors, such as the framework giving a weird offer reply, where we 
> > immediately drop that framework, then it's fine to just have text. But if 
> > we eventually want to use it for non-fatal ones where we expect some 
> > frameworks to be able to recover, we might want to give them a numeric 
> > code. One other option might be to make the argument to error() a protobuf 
> > (ErrorInfo).

I don't think we want to get into the business of requiring people to deal with 
error conditions non-locally. This seems like a horrible idea, and very hard to 
reason about. Consider:

void resourceOffers(...) {
  ...;
  driver.launchTasks(...);
  ...;
}

...

void error(..., int code, ...) {
  if (code somehow means the launchTask thing I did some time ago up in 
resourceOffers) {
    do something;
  }
}

For this reason, I think error should only be for "fatal" errors. Note that 
with framework failover it's perfectly fine for a framework that was aborted 
due to an error to recreate another driver and try again.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4474/#review6338
-----------------------------------------------------------


On 2012-03-24 23:31:16, Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4474/
> -----------------------------------------------------------
> 
> (Updated 2012-03-24 23:31:16)
> 
> 
> Review request for mesos, Andy Konwinski, Charles Reiss, Matei Zaharia, John 
> Sirois, and Vinod Kone.
> 
> 
> Summary
> -------
> 
> Apologies for a rather large diff. I meant to do a few independent changes 
> but they all got mangled into one because of subtle dependencies between them.
> 
> * Changed MesosSchedulerDriver to take a FrameworkInfo instead of a
>   name and possibly FrameworkID. This required adding an optional
>   FrameworkID to FrameworkInfo. The hope is that with this change can
>   more easily update the FrameworkInfo semantics without requiring
>   recompilation. For example, as we add better support for auth
>   entication we should hopefully be able to do that directly in
>   FrameworkInfo. We also simplified the C++ MesosSchedulerDriver by
>   removing unused constructors (that also don't have versions in
>   either Java or Python).
> 
> * Renamed the 'url' parameter of MesosSchedulerDriver to 'master' and
>   updated the allowable schemes of 'master' to include a host:port
>   pair (we internally add 'mas ter@') and replaced zoo:// with zk://
>   and zoofile:// with file:// (but with different semantics, see the
>   documentation). We also changed the master webui to show t he
>   host:port and removed the PID and updated and added tests related to
>   creating MasterDetectors from the above schemes.
> 
> * Replaced the --url argument of mesos-master with --zk argument for
>   clarity.
> 
> * Removed 'code' from the Scheduler/Executor::error callback since it
>   was not being used (nor did it seem useful in the future).
> 
> * Made some naming consistency changes across the C++, Java, and
>   Python interfaces (sched -> scheduler, exec -> executor, etc).
> 
> 
> Diffs
> -----
> 
>   hadoop/hadoop-0.20.205.0.patch 03f3949 
>   include/mesos/executor.hpp ef7bdf9 
>   include/mesos/mesos.proto 23aad17 
>   include/mesos/scheduler.hpp 5bcf065 
>   src/Makefile.am 090c07a 
>   src/deploy/mesos-env.sh c3dfda2 
>   src/detector/detector.hpp 39c19e4 
>   src/detector/detector.cpp 421f035 
>   src/detector/url_processor.hpp 3be0c09 
>   src/detector/url_processor.cpp 1a83b26 
>   src/examples/java/TestExceptionFramework.java d15b132 
>   src/examples/java/TestExecutor.java f57cda4 
>   src/examples/java/TestFramework.java f86646b 
>   src/examples/java/TestMultipleExecutorsFramework.java cdbcc48 
>   src/examples/long_lived_executor.cpp 4b4abad 
>   src/examples/long_lived_framework.cpp 2775f5a 
>   src/examples/no_executor_framework.cpp fb31b25 
>   src/examples/python/test_framework.py 0d71831 
>   src/examples/test_executor.cpp ba54004 
>   src/examples/test_framework.cpp 622f6ac 
>   src/exec/exec.cpp 060707ad 
>   src/java/jni/construct.cpp 90b7e9d 
>   src/java/jni/org_apache_mesos_Log.cpp 4c9a4e1 
>   src/java/jni/org_apache_mesos_MesosExecutorDriver.cpp 579ebbe 
>   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 547d54b 
>   src/java/src/org/apache/mesos/Executor.java 3db3068 
>   src/java/src/org/apache/mesos/MesosExecutorDriver.java 6781dec 
>   src/java/src/org/apache/mesos/MesosSchedulerDriver.java 129a108 
>   src/java/src/org/apache/mesos/Scheduler.java b01468b 
>   src/launcher/executor.cpp da4203c 
>   src/master/main.cpp f31212b 
>   src/master/master.hpp 320d760 
>   src/master/master.cpp 6110c17 
>   src/mesos/main.cpp 9e012f8 
>   src/messages/messages.proto 7f9cffe 
>   src/python/native/mesos_scheduler_driver_impl.cpp 9e088ef 
>   src/python/native/proxy_executor.hpp 32df90b 
>   src/python/native/proxy_executor.cpp d453406 
>   src/python/native/proxy_scheduler.hpp d0e5dfd 
>   src/python/native/proxy_scheduler.cpp 9717f44 
>   src/python/src/mesos.py 25d762b 
>   src/sched/sched.cpp 9440381 
>   src/slave/main.cpp 6b4f37a 
>   src/slave/slave.cpp 28900aa 
>   src/tests/attributes_test.cpp 1e87d923 
>   src/tests/attributes_tests.cpp PRE-CREATION 
>   src/tests/exception_tests.cpp c14bacc 
>   src/tests/fault_tolerance_tests.cpp 9e7d181 
>   src/tests/master_detector_tests.cpp PRE-CREATION 
>   src/tests/master_tests.cpp 57db468 
>   src/tests/resource_offers_tests.cpp a405888 
>   src/tests/url_processor_tests.cpp 573524b 
>   src/tests/utils.hpp 9241567 
>   src/tests/zookeeper_url_tests.cpp PRE-CREATION 
>   src/webui/master/index.tpl bc0cd0b 
>   src/zookeeper/authentication.hpp 944d95e 
>   src/zookeeper/url.hpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4474/diff
> 
> 
> Testing
> -------
> 
> make check && make hadoop
> 
> 
> Thanks,
> 
> Benjamin
> 
>

Reply via email to