[
https://issues.apache.org/jira/browse/MESOS-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kone updated MESOS-2627:
------------------------------
Summary: ExamplesTest.PersistentVolumeFramework is flaky (was:
ExamplesTest.PersistentVolumeFramework is flaky on OS X)
Happened on CentOS too.
https://builds.apache.org/job/Mesos/COMPILER=clang,LABEL=docker%7C%7CHadoop,OS=ubuntu:14.10/154/consoleFull
{code}
[ RUN ] ExamplesTest.PersistentVolumeFramework
Using temporary directory '/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM'
I0417 22:48:04.794379 24154 logging.cpp:177] Logging to STDERR
I0417 22:48:04.806035 24154 process.cpp:912] libprocess is initialized on
172.17.0.62:55041 for 16 cpus
I0417 22:48:04.814960 24154 leveldb.cpp:176] Opened db in 3.882231ms
I0417 22:48:04.816679 24154 leveldb.cpp:183] Compacted db in 1.689813ms
I0417 22:48:04.816825 24154 leveldb.cpp:198] Created db iterator in 105015ns
I0417 22:48:04.816859 24154 leveldb.cpp:204] Seeked to beginning of db in
14744ns
I0417 22:48:04.816872 24154 leveldb.cpp:273] Iterated through 0 keys in the db
in 8275ns
I0417 22:48:04.817123 24154 replica.cpp:744] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0417 22:48:04.819927 24182 recover.cpp:449] Starting replica recovery
I0417 22:48:04.821547 24182 recover.cpp:475] Replica is in EMPTY status
I0417 22:48:04.824378 24191 master.cpp:361] Master
20150417-224804-1040191916-55041-24154 (7630717d513f) started on
172.17.0.62:55041
I0417 22:48:04.824446 24191 master.cpp:363] Flags at startup:
--acls="permissive: true
register_frameworks {
principals {
type: ANY
}
roles {
type: SOME
values: "test"
}
}
" --allocation_interval="1secs" --authenticate="false"
--authenticate_slaves="false" --authenticators="crammd5"
--credentials="/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM/credentials"
--framework_sorter="drf" --initialize_driver_logging="true"
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--quiet="false" --recovery_slave_removal_limit="100%"
--registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="5secs" --registry_strict="false" --roles="test"
--root_submissions="true" --slave_reregister_timeout="10mins"
--user_sorter="drf" --version="false"
--webui_dir="/mesos/mesos-0.23.0/src/webui" --work_dir="/tmp/mesos-tmgVmD"
--zk_session_timeout="10secs"
I0417 22:48:04.825870 24191 master.cpp:410] Master allowing unauthenticated
frameworks to register
I0417 22:48:04.825896 24191 master.cpp:415] Master allowing unauthenticated
slaves to register
I0417 22:48:04.825912 24191 credentials.hpp:37] Loading credentials for
authentication from
'/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM/credentials'
I0417 22:48:04.825928 24187 replica.cpp:641] Replica in EMPTY status received a
broadcasted recover request
W0417 22:48:04.826073 24191 credentials.hpp:52] Permissions on credentials file
'/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM/credentials' are too open.
It is recommended that your credentials file is NOT accessible by others.
I0417 22:48:04.826552 24191 master.cpp:452] Using default 'crammd5'
authenticator
I0417 22:48:04.826997 24191 authenticator.cpp:511] Initializing server SASL
I0417 22:48:04.827725 24154 containerizer.cpp:113] Using isolation:
posix/cpu,posix/mem
I0417 22:48:04.828730 24192 recover.cpp:195] Received a recover response from a
replica in EMPTY status
I0417 22:48:04.829848 24179 recover.cpp:566] Updating replica status to STARTING
I0417 22:48:04.830972 24189 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 908555ns
I0417 22:48:04.831010 24191 auxprop.cpp:65] Initialized in-memory auxiliary
property plugin
I0417 22:48:04.831030 24189 replica.cpp:323] Persisted replica status to
STARTING
I0417 22:48:04.831171 24191 master.cpp:489] Authorization enabled
I0417 22:48:04.831217 24180 slave.cpp:176] Slave started on 1)@172.17.0.62:55041
I0417 22:48:04.831496 24186 recover.cpp:475] Replica is in STARTING status
I0417 22:48:04.831269 24180 slave.cpp:177] Flags at startup:
--authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--default_role="test" --disk_watch_interval="1mins" --docker="docker"
--docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home=""
--gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home=""
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/mesos/mesos-0.23.0/_build/src/.libs" --logbufsecs="0"
--logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--strict="true" --switch_user="true" --version="false"
--work_dir="/tmp/mesos-tmgVmD/0"
I0417 22:48:04.832581 24180 slave.cpp:316] Slave resources: cpus(test):16;
mem(test):47270; disk(test):3.70122e+06; ports(test):[31000-32000]
I0417 22:48:04.832710 24154 containerizer.cpp:113] Using isolation:
posix/cpu,posix/mem
I0417 22:48:04.832805 24180 slave.cpp:346] Slave hostname: 7630717d513f
I0417 22:48:04.832826 24180 slave.cpp:351] Slave checkpoint: true
I0417 22:48:04.832828 24186 replica.cpp:641] Replica in STARTING status
received a broadcasted recover request
I0417 22:48:04.832908 24190 whitelist_watcher.cpp:79] No whitelist given
I0417 22:48:04.833047 24187 recover.cpp:195] Received a recover response from a
replica in STARTING status
I0417 22:48:04.833511 24181 recover.cpp:566] Updating replica status to VOTING
I0417 22:48:04.833739 24177 hierarchical.hpp:287] Initialized hierarchical
allocator process
I0417 22:48:04.834203 24182 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 550421ns
I0417 22:48:04.834228 24182 replica.cpp:323] Persisted replica status to VOTING
I0417 22:48:04.834250 24183 slave.cpp:176] Slave started on 2)@172.17.0.62:55041
I0417 22:48:04.834328 24181 recover.cpp:580] Successfully joined the Paxos group
I0417 22:48:04.834290 24183 slave.cpp:177] Flags at startup:
--authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--default_role="test" --disk_watch_interval="1mins" --docker="docker"
--docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home=""
--gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home=""
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/mesos/mesos-0.23.0/_build/src/.libs" --logbufsecs="0"
--logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--strict="true" --switch_user="true" --version="false"
--work_dir="/tmp/mesos-tmgVmD/1"
I0417 22:48:04.834744 24181 recover.cpp:464] Recover process terminated
I0417 22:48:04.835010 24183 slave.cpp:316] Slave resources: cpus(test):16;
mem(test):47270; disk(test):3.70122e+06; ports(test):[31000-32000]
I0417 22:48:04.835120 24183 slave.cpp:346] Slave hostname: 7630717d513f
I0417 22:48:04.835136 24183 slave.cpp:351] Slave checkpoint: true
I0417 22:48:04.836086 24154 containerizer.cpp:113] Using isolation:
posix/cpu,posix/mem
I0417 22:48:04.837262 24178 slave.cpp:176] Slave started on 3)@172.17.0.62:55041
I0417 22:48:04.837357 24178 slave.cpp:177] Flags at startup:
--authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--default_role="test" --disk_watch_interval="1mins" --docker="docker"
--docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home=""
--gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home=""
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/mesos/mesos-0.23.0/_build/src/.libs" --logbufsecs="0"
--logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--strict="true" --switch_user="true" --version="false"
--work_dir="/tmp/mesos-tmgVmD/2"
I0417 22:48:04.838080 24178 slave.cpp:316] Slave resources: cpus(test):16;
mem(test):47270; disk(test):3.70122e+06; ports(test):[31000-32000]
I0417 22:48:04.838181 24178 slave.cpp:346] Slave hostname: 7630717d513f
I0417 22:48:04.838196 24178 slave.cpp:351] Slave checkpoint: true
*** Error in
`/mesos/mesos-0.23.0/_build/src/.libs/lt-persistent-volume-framework': double
free or corruption (fasttop): 0x00007f90ec00d460 ***
*** Aborted at 1429310884 (unix time) try "date -d @1429310884" if you are
using GNU date ***
I0417 22:48:04.839388 24185 state.cpp:35] Recovering state from
'/tmp/mesos-tmgVmD/0/meta'
I0417 22:48:04.839416 24187 state.cpp:35] Recovering state from
'/tmp/mesos-tmgVmD/2/meta'
I0417 22:48:04.840162 24154 sched.cpp:157] Version: 0.23.0
I0417 22:48:04.840227 24186 status_update_manager.cpp:197] Recovering status
update manager
I0417 22:48:04.840237 24184 status_update_manager.cpp:197] Recovering status
update manager
PC: @ 0x7f9125dc6e37 (unknown)
I0417 22:48:04.840636 24177 containerizer.cpp:310] Recovering containerizer
I0417 22:48:04.840630 24184 containerizer.cpp:310] Recovering containerizer
I0417 22:48:04.840785 24188 master.cpp:1422] The newly elected leader is
[email protected]:55041 with id 20150417-224804-1040191916-55041-24154
I0417 22:48:04.840895 24188 master.cpp:1435] Elected as the leading master!
I0417 22:48:04.840919 24188 master.cpp:1207] Recovering from registrar
I0417 22:48:04.841101 24185 registrar.cpp:313] Recovering registrar
I0417 22:48:04.842108 24178 sched.cpp:254] New master detected at
[email protected]:55041
I0417 22:48:04.842408 24189 slave.cpp:3806] Finished recovery
I0417 22:48:04.842409 24190 slave.cpp:3806] Finished recovery
I0417 22:48:04.842586 24178 sched.cpp:264] No credentials provided. Attempting
to register without authentication
I0417 22:48:04.842603 24181 log.cpp:660] Attempting to start the writer
I0417 22:48:04.842613 24178 sched.cpp:521] Sending registration request to
[email protected]:55041
I0417 22:48:04.842722 24178 sched.cpp:554] Will retry registration in
32.354951ms if necessary
I0417 22:48:04.842785 24177 master.cpp:1034] Dropping
'mesos.internal.RegisterFrameworkMessage' message since not recovered yet
I0417 22:48:04.843227 24188 status_update_manager.cpp:171] Pausing sending
status updates
I0417 22:48:04.843228 24189 slave.cpp:627] New master detected at
[email protected]:55041
I0417 22:48:04.843364 24189 slave.cpp:652] No credentials provided. Attempting
to register without authentication
I0417 22:48:04.843446 24189 slave.cpp:663] Detecting new master
I0417 22:48:04.843808 24185 status_update_manager.cpp:171] Pausing sending
status updates
I0417 22:48:04.843821 24190 slave.cpp:627] New master detected at
[email protected]:55041
I0417 22:48:04.843879 24190 slave.cpp:652] No credentials provided. Attempting
to register without authentication
I0417 22:48:04.843911 24190 slave.cpp:663] Detecting new master
I0417 22:48:04.845938 24182 replica.cpp:477] Replica received implicit promise
request with proposal 1
I0417 22:48:04.846547 24182 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 591136ns
I0417 22:48:04.846586 24182 replica.cpp:345] Persisted promised to 1
I0417 22:48:04.847851 24177 coordinator.cpp:230] Coordinator attemping to fill
missing position
I0417 22:48:04.849622 24185 replica.cpp:378] Replica received explicit promise
request for position 0 with proposal 2
I0417 22:48:04.850231 24185 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 568859ns
I0417 22:48:04.850258 24185 replica.cpp:679] Persisted action at 0
I0417 22:48:04.851675 24191 replica.cpp:511] Replica received write request for
position 0
I0417 22:48:04.851797 24191 leveldb.cpp:438] Reading position from leveldb took
96390ns
I0417 22:48:04.852370 24191 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 518618ns
I0417 22:48:04.852393 24191 replica.cpp:679] Persisted action at 0
I0417 22:48:04.853102 24190 replica.cpp:658] Replica received learned notice
for position 0
I0417 22:48:04.853556 24190 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 442980ns
I0417 22:48:04.853580 24190 replica.cpp:679] Persisted action at 0
I0417 22:48:04.853600 24190 replica.cpp:664] Replica learned NOP action at
position 0
I0417 22:48:04.854176 24182 log.cpp:676] Writer started with ending position 0
I0417 22:48:04.857132 24185 leveldb.cpp:438] Reading position from leveldb took
82598ns
*** SIGABRT (@0x3e800005e5a) received by PID 24154 (TID 0x7f911c141700) from
PID 24154; stack trace: ***
I0417 22:48:04.861847 24190 registrar.cpp:346] Successfully fetched the
registry (0B) in 20.683264ms
I0417 22:48:04.862339 24190 registrar.cpp:445] Applied 1 operations in
190192ns; attempting to update the 'registry'
@ 0x7f9126163c90 (unknown)
@ 0x7f9125dc6e37 (unknown)
@ 0x7f9125dc8528 (unknown)
I0417 22:48:04.866438 24188 log.cpp:684] Attempting to append 130 bytes to the
log
I0417 22:48:04.866677 24184 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 1
@ 0x7f9125e07f74 (unknown)
I0417 22:48:04.867557 24182 replica.cpp:511] Replica received write request for
position 1
@ 0x7f9125e0f586 (unknown)
I0417 22:48:04.868192 24182 leveldb.cpp:343] Persisting action (149 bytes) to
leveldb took 604955ns
I0417 22:48:04.868221 24182 replica.cpp:679] Persisted action at 1
I0417 22:48:04.868923 24192 replica.cpp:658] Replica received learned notice
for position 1
I0417 22:48:04.869328 24192 leveldb.cpp:343] Persisting action (151 bytes) to
leveldb took 389604ns
I0417 22:48:04.869360 24192 replica.cpp:679] Persisted action at 1
I0417 22:48:04.869400 24192 replica.cpp:664] Replica learned APPEND action at
position 1
I0417 22:48:04.871562 24188 registrar.cpp:490] Successfully updated the
'registry' in 9.007872ms
I0417 22:48:04.871757 24188 registrar.cpp:376] Successfully recovered registrar
I0417 22:48:04.872179 24178 master.cpp:1234] Recovered 0 slaves from the
Registry (94B) ; allowing 10mins for slaves to re-register
I0417 22:48:04.872267 24180 log.cpp:703] Attempting to truncate the log to 1
I0417 22:48:04.872423 24188 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 2
I0417 22:48:04.873039 24185 replica.cpp:511] Replica received write request for
position 2
I0417 22:48:04.873473 24185 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 416055ns
I0417 22:48:04.873505 24185 replica.cpp:679] Persisted action at 2
I0417 22:48:04.873989 24190 replica.cpp:658] Replica received learned notice
for position 2
I0417 22:48:04.874287 24190 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 282809ns
I0417 22:48:04.874344 24190 leveldb.cpp:401] Deleting ~1 keys from leveldb took
41834ns
I0417 22:48:04.874366 24190 replica.cpp:679] Persisted action at 2
I0417 22:48:04.874398 24190 replica.cpp:664] Replica learned TRUNCATE action at
position 2
@ 0x7f9128e10ffa
_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2EEEENS1_9_DeferredIDTclsr3stdE4bindadsr3std8functionIFvT0_T1_EEEclcvSt8functionISD_E_Efp1_fp2_EEEERKNS1_3PIDIT_EEMSJ_FvSB_SC_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE
I0417 22:48:04.876715 24187 sched.cpp:521] Sending registration request to
[email protected]:55041
I0417 22:48:04.876821 24187 sched.cpp:554] Will retry registration in
846.417151ms if necessary
I0417 22:48:04.876999 24182 master.cpp:1640] Received registration request for
framework 'Persistent Volume Framework (C++)' at
[email protected]:55041
I0417 22:48:04.877054 24182 master.cpp:1501] Authorizing framework principal
'test' to receive offers for role 'test'
I0417 22:48:04.878262 24184 master.cpp:1707] Registering framework
20150417-224804-1040191916-55041-24154-0000 (Persistent Volume Framework (C++))
at [email protected]:55041
I0417 22:48:04.879447 24184 sched.cpp:448] Framework registered with
20150417-224804-1040191916-55041-24154-0000
I0417 22:48:04.879449 24188 hierarchical.hpp:321] Added framework
20150417-224804-1040191916-55041-24154-0000
I0417 22:48:04.879508 24184 persistent_volume_framework.cpp:129] Registered
with master id: "20150417-224804-1040191916-55041-24154"
ip: 1040191916
port: 55041
pid: "[email protected]:55041"
hostname: "7630717d513f"
and got framework ID 20150417-224804-1040191916-55041-24154-0000
I0417 22:48:04.879643 24184 sched.cpp:462] Scheduler::registered took 148804ns
I0417 22:48:04.879751 24188 hierarchical.hpp:818] No resources available to
allocate!
I0417 22:48:04.879829 24188 hierarchical.hpp:725] Performed allocation for 0
slaves in 321247ns
@ 0x7f9128e10ec6
_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2EEEENS1_9_DeferredIDTclsr3stdE4bindadsr3std8functionIFvT0_T1_EEEclcvSt8functionISD_E_Efp1_fp2_EEEERKNS1_3PIDIT_EEMSJ_FvSB_SC_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation
@ 0x465447 std::_Function_base::~_Function_base()
@ 0x7f9128dab1b5 std::function<>::~function()
@ 0x7f9128e0fec5 std::_Head_base<>::~_Head_base()
@ 0x7f9128e0fea5
_ZNSt11_Tuple_implILm0EJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2EEEED2Ev
@ 0x7f9128e0fe85
_ZNSt5tupleIJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2EEEED2Ev
@ 0x7f9128e0ecac std::_Bind<>::~_Bind()
@ 0x7f9128e10151 std::_Function_base::_Base_manager<>::_M_destroy()
@ 0x7f9128e10026 std::_Function_base::_Base_manager<>::_M_manager()
@ 0x465447 std::_Function_base::~_Function_base()
@ 0x7f9128dab1b5 std::function<>::~function()
@ 0x7f9128dae98e std::function<>::operator=()
@ 0x7f9128d77499 mesos::internal::slave::Slave::initialize()
@ 0x7f9128d864b2 mesos::internal::slave::Slave::initialize()
@ 0x7f912955de1f process::ProcessManager::resume()
@ 0x7f912955d9ee process::schedule()
@ 0x7f912615c0a5 start_thread
@ 0x7f9125e89cfd (unknown)
../../src/tests/script.cpp:83: Failure
Failed
persistent_volume_framework_test.sh terminated with signal Aborted
[ FAILED ] ExamplesTest.PersistentVolumeFramework (1007 ms)
{code}
> ExamplesTest.PersistentVolumeFramework is flaky
> -----------------------------------------------
>
> Key: MESOS-2627
> URL: https://issues.apache.org/jira/browse/MESOS-2627
> Project: Mesos
> Issue Type: Bug
> Environment: OS X Yosemite
> Reporter: Cody Maloney
> Labels: flaky, flaky-test
>
> This just failed for the first time on our OS X Bot (Far less frequent flaky
> than the other ExamplesTest, but still flaky) while compiling master at
> commit f6620f851f635b3346c6ebf878152f38b3932ad9. There weren't any commits
> which touched / changed anything in the test in the set.
> {code}
> [ RUN ] ExamplesTest.PersistentVolumeFramework
> ../../src/tests/script.cpp:83: Failure Failed
> persistent_volume_framework_test.sh terminated with signal Abort trap: 6
> [ FAILED ] ExamplesTest.PersistentVolumeFramework (7865 ms)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)