Benjamin Mahler created MESOS-828:
-------------------------------------
Summary: CgroupsIsolator BalloonFramework Test is broken.
Key: MESOS-828
URL: https://issues.apache.org/jira/browse/MESOS-828
Project: Mesos
Issue Type: Bug
Components: test
Reporter: Benjamin Mahler
Assignee: Vinod Kone
Priority: Blocker
This was broken by the following commit:
commit 82a0d329e112cc67e2916bda6e44cb00fe1a1236
Author: Vinod Kone <[email protected]>
Date: Wed Nov 20 14:32:35 2013 -0800
Fixed local cluster slaves to use temporary work directory.
[ RUN ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
Using temporary directory
'/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_KGmMtN'
Launched master at 13630
I1120 22:55:24.342715 13630 main.cpp:126] Build: 2013-10-29 19:20:17 by root
I1120 22:55:24.342897 13630 main.cpp:127] Starting Mesos master
I1120 22:55:24.343392 13673 master.cpp:285] Master started on 127.0.0.1:5432
I1120 22:55:24.343453 13673 master.cpp:299] Master ID:
201311202255-16777343-5432-13630
I1120 22:55:24.343464 13673 master.cpp:302] Master only allowing authenticated
frameworks to register!
I1120 22:55:24.345286 13673 master.cpp:744] The newly elected leader is
[email protected]:5432
I1120 22:55:24.345309 13673 master.cpp:748] Elected as the leading master!
Launched slave at 15663
Duplicate flag 'work_dir' on command line
Usage: lt-mesos-slave [...]
Supported options:
--attributes=VALUE Attributes of machine
--[no-]cgroups_enable_cfs Cgroups feature flag to enable
hard limits on CPU resources
via the CFS bandwidth limiting
subfeature.
(default: false)
--cgroups_hierarchy=VALUE The path to the cgroups hierarchy
root
(default: /cgroup)
--cgroups_root=VALUE Name of the root cgroup
(default: mesos)
--cgroups_subsystems=VALUE List of subsystems to enable
(e.g., 'cpu,freezer')
(default: cpu,memory,freezer)
--[no-]checkpoint Whether to checkpoint slave and
frameworks information
to disk. This enables a restarted
slave to recover
status updates and reconnect with
(--recover=reconnect) or
kill (--recover=kill) old
executors (default: true)
--default_role=VALUE Any resources in the --resources
flag that
omit a role, as well as any
resources that
are not present in --resources but
that are
automatically detected, will be
assigned to
this role. (default: *)
--disk_watch_interval=VALUE Periodic time interval (e.g.,
10secs, 2mins, etc)
to check the disk usage (default:
1mins)
--executor_registration_timeout=VALUE Amount of time to wait for an
executor
to register with the slave before
considering it hung and
shutting it down (e.g., 60secs,
3mins, etc) (default: 1mins)
--executor_shutdown_grace_period=VALUE Amount of time to wait for an
executor
to shut down (e.g., 60secs, 3mins,
etc) (default: 5secs)
--frameworks_home=VALUE Directory prepended to relative
executor URIs (default: )
--gc_delay=VALUE Maximum amount of time to wait
before cleaning up
executor directories (e.g., 3days,
2weeks, etc).
Note that this delay may be
shorter depending on
the available disk usage.
(default: 1weeks)
--hadoop_home=VALUE Where to find Hadoop installed (for
fetching framework executors from
HDFS)
(no default, look for HADOOP_HOME
in
environment or find hadoop on
PATH) (default: )
--[no-]help Prints this help message (default:
false)
--hostname=VALUE The hostname the slave should
report.
If left unset, system hostname
will be used (recommended).
--ip=VALUE IP address to listen on
--isolation=VALUE Isolation mechanism, may be one
of: process, cgroups (default: process)
--launcher_dir=VALUE Location of Mesos binaries
(default: /usr/local/libexec/mesos)
--log_dir=VALUE Location to put log files (no
default, nothing
is written to disk unless
specified;
does not affect logging to stderr)
--logbufsecs=VALUE How many seconds to buffer log
messages for (default: 0)
--master=VALUE May be one of:
zk://host1:port1,host2:port2,.../path
zk://username:password@host1:port1,host2:port2,.../path
file://path/to/file (where file
contains one of the above)
--port=VALUE Port to listen on (default: 5051)
--[no-]quiet Disable logging to stderr
(default: false)
--recover=VALUE Whether to recover status updates
and reconnect with old executors.
Valid values for 'recover' are
reconnect: Reconnect with any old
live executors.
cleanup : Kill any old live
executors and exit.
Use this option when
doing an incompatible slave
or executor upgrade!).
NOTE: If checkpointed slave
doesn't exist, no recovery is performed
and the slave registers with
the master as a new slave. (default: reconnect)
--recovery_timeout=VALUE Amount of time alloted for the
slave to recover. If the slave takes
longer than recovery_timeout to
recover, any executors that are
waiting to reconnect to the slave
will self-terminate.
NOTE: This flag is only applicable
when checkpoint is enabled.
(default: 15mins)
--resource_monitoring_interval=VALUE Periodic time interval for
monitoring executor
resource usage (e.g., 10secs,
1min, etc) (default: 1secs)
--resources=VALUE Total consumable resources per
slave, in
the form
'name(role):value;name(role):value...'.
--[no-]strict If strict=true, any and all
recovery errors are considered fatal.
If strict=false, any expected
errors (e.g., slave cannot recover
information about an executor,
because the slave died right before
the executor registered.) during
recovery are ignored and as much
state as possible is recovered.
(default: true)
--[no-]switch_user Whether to run tasks as the user
who
submitted them rather than the
user running
the slave (requires setuid
permission) (default: true)
--work_dir=VALUE Where to place framework work
directories
(default: /tmp/mesos)
Slave crashed; failing test
W1120 22:55:24.345309 13630 logging.cpp:58] RAW: Received signal SIGTERM;
exiting.
/home/bmahler/git/mesos/src/tests/balloon_framework_test.sh: line 38: kill:
(15663) - No such process
find: /tmp/mesos_test_cgroup: No such file or directory
/home/bmahler/git/mesos/src/tests/balloon_framework_test.sh: line 30: 13630
Terminated ${MASTER} --ip=127.0.0.1 --port=5432
rmdir: /tmp/mesos_test_cgroup/mesos_test: No such file or directory
../../src/tests/script.cpp:78: Failure
Failed
balloon_framework_test.sh exited with status 2
[ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework (4072 ms)
[----------] 1 test from CgroupsIsolatorTest (4073 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4073 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
1 FAILED TEST
YOU HAVE 2 DISABLED TESTS
--
This message was sent by Atlassian JIRA
(v6.1#6144)