[jira] [Commented] (MESOS-5114) empty quorum config causes masters to fail replica recovery and fail

Jie Yu (JIRA) Mon, 04 Apr 2016 16:29:33 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225259#comment-15225259
 ]


Jie Yu commented on MESOS-5114:
-------------------------------

I wrote a simple flag test:
{noformat}
TEST(FlagsTest, LoadFromEnvironmentEmptyInteger)
{
  TestFlags flags;

  Option<int> name6;

  flags.add(&name6,
            "name6",
            "Optional name6");

  os::setenv("FLAGSTEST_name6", "");

  Try<Nothing> load = flags.load("FLAGSTEST_");
  EXPECT_SOME(load);

  EXPECT_NONE(name6);

  os::unsetenv("FLAGSTEST_name6");
}
{noformat}

And it breaks:
{noformat}
[ RUN      ] FlagsTest.LoadFromEnvironmentEmptyInteger
/Users/jie/workspace/vagrant/trusty/mesos/3rdparty/libprocess/3rdparty/stout/tests/flags_tests.cpp:217:
 Failure
Value of: name6.isNone()
  Actual: false
Expected: true
[  FAILED  ] FlagsTest.LoadFromEnvironmentEmptyInteger (0 ms)
{noformat}

> empty quorum config causes masters to fail replica recovery and fail
> --------------------------------------------------------------------
>
>                 Key: MESOS-5114
>                 URL: https://issues.apache.org/jira/browse/MESOS-5114
>             Project: Mesos
>          Issue Type: Bug
>          Components: master, replicated log
>    Affects Versions: 0.28.0
>         Environment: CentOS 7.1
>            Reporter: Cosmin Lehene
>             Fix For: 0.28.1
>
>
> A missing default for quorum size has generated the following master config 
> {code}
> MESOS_WORK_DIR="/var/lib/mesos/master"
> MESOS_ZK="zk://zk1:2181,zk2:2181,zk3:2181/mesos"
> MESOS_QUORUM=
> MESOS_PORT=5050
> MESOS_CLUSTER="mesos"
> MESOS_LOG_DIR="/var/log/mesos"
> MESOS_LOGBUFSECS=1
> MESOS_LOGGING_LEVEL="INFO"
> {code}
> This was causing each elected leader to attempt replica recovery.
> E.g. {{group.cpp:700] Trying to get '/mesos/log_replicas/0000000012' in 
> ZooKeeper}}
> And eventually:
> {{master.cpp:1458] Recovery failed: Failed to recover registrar: Failed to 
> perform fetch within 1mins}}
> Full log on one of the masters 
> https://gist.github.com/clehene/09a9ddfe49b92a5deb4c1b421f63479e
> All masters and zk nodes were reachable over the network. 
> Also once the quorum was configured the master recovery protocol finished 
> gracefully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5114) empty quorum config causes masters to fail replica recovery and fail

Reply via email to