[
https://issues.apache.org/jira/browse/MESOS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505927#comment-14505927
]
Vinod Kone commented on MESOS-1303:
-----------------------------------
Different error. Same test.
https://builds.apache.org/user/vinodkone/my-views/view/Mesos/job/Mesos/159/COMPILER=clang,LABEL=docker%7C%7CHadoop,OS=ubuntu:14.10/consoleFull
{code}
[ RUN ] ExamplesTest.TestFramework
Using temporary directory '/tmp/ExamplesTest_TestFramework_BxGYx5'
I0421 21:46:28.803078 23561 logging.cpp:177] Logging to STDERR
Enabling authentication for the framework
I0421 21:46:28.808887 23561 process.cpp:912] libprocess is initialized on
172.17.0.3:33642 for 16 cpus
I0421 21:46:28.823137 23561 leveldb.cpp:176] Opened db in 3.6493ms
I0421 21:46:28.825106 23561 leveldb.cpp:183] Compacted db in 1.940888ms
I0421 21:46:28.825326 23561 leveldb.cpp:198] Created db iterator in 127524ns
I0421 21:46:28.825361 23561 leveldb.cpp:204] Seeked to beginning of db in
13445ns
I0421 21:46:28.825373 23561 leveldb.cpp:273] Iterated through 0 keys in the db
in 9530ns
I0421 21:46:28.825688 23561 replica.cpp:744] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0421 21:46:28.829357 23592 recover.cpp:449] Starting replica recovery
I0421 21:46:28.830732 23592 recover.cpp:475] Replica is in EMPTY status
I0421 21:46:28.833781 23594 master.cpp:362] Master
20150421-214628-50336172-33642-23561 (a709bc073343) started on 172.17.0.3:33642
I0421 21:46:28.834431 23593 replica.cpp:641] Replica in EMPTY status received a
broadcasted recover request
I0421 21:46:28.833869 23594 master.cpp:364] Flags at startup:
--acls="permissive: false
register_frameworks {
principals {
type: SOME
values: "test-principal"
}
roles {
type: SOME
values: "*"
}
}
run_tasks {
principals {
type: SOME
values: "test-principal"
}
users {
type: SOME
values: "mesos"
}
}
" --allocation_interval="1secs" --authenticate="true"
--authenticate_slaves="false" --authenticators="crammd5"
--credentials="/tmp/ExamplesTest_TestFramework_BxGYx5/credentials"
--framework_sorter="drf" --initialize_driver_logging="true"
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--quiet="false" --recovery_slave_removal_limit="100%"
--registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="5secs" --registry_strict="false"
--root_submissions="true" --slave_reregister_timeout="10mins"
--user_sorter="drf" --version="false"
--webui_dir="/mesos/mesos-0.23.0/src/webui" --work_dir="/tmp/mesos-eA50Yc"
--zk_session_timeout="10secs"
I0421 21:46:28.835439 23594 master.cpp:409] Master only allowing authenticated
frameworks to register
I0421 21:46:28.835464 23594 master.cpp:416] Master allowing unauthenticated
slaves to register
I0421 21:46:28.835481 23594 credentials.hpp:37] Loading credentials for
authentication from '/tmp/ExamplesTest_TestFramework_BxGYx5/credentials'
W0421 21:46:28.835682 23594 credentials.hpp:52] Permissions on credentials file
'/tmp/ExamplesTest_TestFramework_BxGYx5/credentials' are too open. It is
recommended that your credentials file is NOT accessible by others.
I0421 21:46:28.835932 23594 master.cpp:453] Using default 'crammd5'
authenticator
I0421 21:46:28.836459 23594 authenticator.cpp:511] Initializing server SASL
I0421 21:46:28.836539 23599 recover.cpp:195] Received a recover response from a
replica in EMPTY status
I0421 21:46:28.837509 23561 containerizer.cpp:113] Using isolation:
posix/cpu,posix/mem
I0421 21:46:28.837683 23597 recover.cpp:566] Updating replica status to STARTING
I0421 21:46:28.839818 23596 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 1.1172ms
I0421 21:46:28.839859 23596 replica.cpp:323] Persisted replica status to
STARTING
I0421 21:46:28.840085 23585 slave.cpp:176] Slave started on 1)@172.17.0.3:33642
I0421 21:46:28.840386 23594 auxprop.cpp:65] Initialized in-memory auxiliary
property plugin
I0421 21:46:28.840387 23589 recover.cpp:475] Replica is in STARTING status
I0421 21:46:28.840574 23594 master.cpp:490] Authorization enabled
I0421 21:46:28.840153 23585 slave.cpp:177] Flags at startup:
--authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home=""
--gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home=""
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/mesos/mesos-0.23.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--resources="cpus:2;mem:10240" --strict="true" --switch_user="true"
--version="false" --work_dir="/tmp/mesos-eA50Yc/0"
I0421 21:46:28.841423 23598 replica.cpp:641] Replica in STARTING status
received a broadcasted recover request
I0421 21:46:28.841403 23585 slave.cpp:316] Slave resources: cpus(*):2;
mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
I0421 21:46:28.841683 23585 slave.cpp:346] Slave hostname: a709bc073343
I0421 21:46:28.841714 23595 whitelist_watcher.cpp:79] No whitelist given
I0421 21:46:28.841717 23561 containerizer.cpp:113] Using isolation:
posix/cpu,posix/mem
I0421 21:46:28.841718 23585 slave.cpp:351] Slave checkpoint: true
I0421 21:46:28.841897 23590 recover.cpp:195] Received a recover response from a
replica in STARTING status
I0421 21:46:28.842223 23586 hierarchical.hpp:288] Initialized hierarchical
allocator process
I0421 21:46:28.842334 23591 recover.cpp:566] Updating replica status to VOTING
I0421 21:46:28.842967 23593 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 431466ns
I0421 21:46:28.843000 23593 replica.cpp:323] Persisted replica status to VOTING
I0421 21:46:28.843106 23584 recover.cpp:580] Successfully joined the Paxos group
I0421 21:46:28.843353 23597 slave.cpp:176] Slave started on 2)@172.17.0.3:33642
I0421 21:46:28.843439 23584 recover.cpp:464] Recover process terminated
I0421 21:46:28.843387 23597 slave.cpp:177] Flags at startup:
--authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home=""
--gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home=""
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/mesos/mesos-0.23.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--resources="cpus:2;mem:10240" --strict="true" --switch_user="true"
--version="false" --work_dir="/tmp/mesos-eA50Yc/1"
I0421 21:46:28.844106 23597 slave.cpp:316] Slave resources: cpus(*):2;
mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
I0421 21:46:28.844238 23597 slave.cpp:346] Slave hostname: a709bc073343
I0421 21:46:28.844259 23597 slave.cpp:351] Slave checkpoint: true
I0421 21:46:28.845682 23561 containerizer.cpp:113] Using isolation:
posix/cpu,posix/mem
I0421 21:46:28.847812 23589 slave.cpp:176] Slave started on 3)@172.17.0.3:33642
I0421 21:46:28.847856 23589 slave.cpp:177] Flags at startup:
--authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="mesos"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home=""
--gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home=""
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher_dir="/mesos/mesos-0.23.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--resources="cpus:2;mem:10240" --strict="true" --switch_user="true"
--version="false" --work_dir="/tmp/mesos-eA50Yc/2"
I0421 21:46:28.848500 23589 slave.cpp:316] Slave resources: cpus(*):2;
mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
I0421 21:46:28.848625 23589 slave.cpp:346] Slave hostname: a709bc073343
I0421 21:46:28.848647 23589 slave.cpp:351] Slave checkpoint: true
I0421 21:46:28.851461 23561 sched.cpp:157] Version: 0.23.0
*** Error in `/mesos/mesos-0.23.0/_build/src/.libs': double free or corruption
(fasttop): 0x00007feb200026b0 ***
*** Aborted at 1429652788 (unix time) try "date -d @1429652788" if you are
using GNU date ***
I0421 21:46:28.851809 23598 state.cpp:35] Recovering state from
'/tmp/mesos-eA50Yc/2/meta'
I0421 21:46:28.851826 23587 state.cpp:35] Recovering state from
'/tmp/mesos-eA50Yc/0/meta'
I0421 21:46:28.852794 23599 status_update_manager.cpp:197] Recovering status
update manager
I0421 21:46:28.852795 23592 status_update_manager.cpp:197] Recovering status
update manager
PC: @ 0x7feb3b69ae37 (unknown)
*** SIGABRT (@0x3e800005c09) received by PID 23561 (TID 0x7feb2e20e700) from
PID 23561; stack trace: ***
I0421 21:46:28.853222 23584 master.cpp:1423] The newly elected leader is
[email protected]:33642 with id 20150421-214628-50336172-33642-23561
I0421 21:46:28.853250 23593 sched.cpp:254] New master detected at
[email protected]:33642
I0421 21:46:28.853273 23584 master.cpp:1436] Elected as the leading master!
I0421 21:46:28.853307 23584 master.cpp:1208] Recovering from registrar
I0421 21:46:28.853349 23595 containerizer.cpp:314] Recovering containerizer
I0421 21:46:28.853361 23588 containerizer.cpp:314] Recovering containerizer
I0421 21:46:28.853440 23593 sched.cpp:310] Authenticating with master
[email protected]:33642
I0421 21:46:28.853479 23593 sched.cpp:317] Using default CRAM-MD5 authenticatee
I0421 21:46:28.853551 23591 registrar.cpp:313] Recovering registrar
I0421 21:46:28.854161 23599 authenticatee.hpp:115] Initializing client SASL
@ 0x7feb3ba37c90 (unknown)
I0421 21:46:28.854426 23599 authenticatee.hpp:139] Creating new client SASL
connection
I0421 21:46:28.854840 23596 master.cpp:1035] Dropping
'mesos.internal.AuthenticateMessage' message since not recovered yet
I0421 21:46:28.855121 23589 log.cpp:660] Attempting to start the writer
I0421 21:46:28.855123 23587 slave.cpp:3810] Finished recovery
I0421 21:46:28.855128 23584 slave.cpp:3810] Finished recovery
@ 0x7feb3b69ae37 (unknown)
I0421 21:46:28.855942 23586 status_update_manager.cpp:171] Pausing sending
status updates
I0421 21:46:28.855986 23599 status_update_manager.cpp:171] Pausing sending
status updates
I0421 21:46:28.855994 23590 slave.cpp:627] New master detected at
[email protected]:33642
I0421 21:46:28.855944 23587 slave.cpp:627] New master detected at
[email protected]:33642
@ 0x7feb3b69c528 (unknown)
I0421 21:46:28.856114 23590 slave.cpp:652] No credentials provided. Attempting
to register without authentication
I0421 21:46:28.856153 23587 slave.cpp:652] No credentials provided. Attempting
to register without authentication
I0421 21:46:28.856287 23590 slave.cpp:663] Detecting new master
I0421 21:46:28.856305 23587 slave.cpp:663] Detecting new master
@ 0x7feb3b6dbf74 (unknown)
@ 0x7feb3b6e3586 (unknown)
I0421 21:46:28.858969 23596 replica.cpp:477] Replica received implicit promise
request with proposal 1
I0421 21:46:28.859593 23596 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 604936ns
I0421 21:46:28.859625 23596 replica.cpp:345] Persisted promised to 1
I0421 21:46:28.861143 23585 coordinator.cpp:230] Coordinator attemping to fill
missing position
I0421 21:46:28.863523 23584 replica.cpp:378] Replica received explicit promise
request for position 0 with proposal 2
@ 0x7feb3e70200a
_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2EEEENS1_9_DeferredIDTclsr3stdE4bindadsr3std8functionIFvT0_T1_EEEclcvSt8functionISD_E_Efp1_fp2_EEEERKNS1_3PIDIT_EEMSJ_FvSB_SC_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE
I0421 21:46:28.864094 23584 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 520811ns
I0421 21:46:28.864131 23584 replica.cpp:679] Persisted action at 0
I0421 21:46:28.865658 23598 replica.cpp:511] Replica received write request for
position 0
I0421 21:46:28.865748 23598 leveldb.cpp:438] Reading position from leveldb took
69879ns
I0421 21:46:28.866178 23598 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 381102ns
I0421 21:46:28.866201 23598 replica.cpp:679] Persisted action at 0
I0421 21:46:28.866885 23586 replica.cpp:658] Replica received learned notice
for position 0
I0421 21:46:28.867220 23586 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 322532ns
I0421 21:46:28.867243 23586 replica.cpp:679] Persisted action at 0
I0421 21:46:28.867261 23586 replica.cpp:664] Replica learned NOP action at
position 0
I0421 21:46:28.867934 23596 log.cpp:676] Writer started with ending position 0
@ 0x7feb3e701ed6
_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2EEEENS1_9_DeferredIDTclsr3stdE4bindadsr3std8functionIFvT0_T1_EEEclcvSt8functionISD_E_Efp1_fp2_EEEERKNS1_3PIDIT_EEMSJ_FvSB_SC_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation
@ 0x42ef67 std::_Function_base::~_Function_base()
I0421 21:46:28.871150 23588 leveldb.cpp:438] Reading position from leveldb took
39294ns
I0421 21:46:28.874131 23585 master.cpp:1035] Dropping
'mesos.internal.RegisterSlaveMessage' message since not recovered yet
I0421 21:46:28.874137 23595 slave.cpp:1088] Will retry registration in
1.611577959secs if necessary
@ 0x7feb3e69c1c5 std::function<>::~function()
I0421 21:46:28.875905 23590 registrar.cpp:346] Successfully fetched the
registry (0B) in 22.283008ms
I0421 21:46:28.876456 23590 registrar.cpp:445] Applied 1 operations in
184265ns; attempting to update the 'registry'
@ 0x7feb3e700ed5 std::_Head_base<>::~_Head_base()
I0421 21:46:28.879930 23599 log.cpp:684] Attempting to append 126 bytes to the
log
I0421 21:46:28.880146 23585 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 1
I0421 21:46:28.880972 23595 replica.cpp:511] Replica received write request for
position 1
I0421 21:46:28.881410 23595 leveldb.cpp:343] Persisting action (144 bytes) to
leveldb took 416426ns
I0421 21:46:28.881434 23595 replica.cpp:679] Persisted action at 1
I0421 21:46:28.881979 23593 replica.cpp:658] Replica received learned notice
for position 1
@ 0x7feb3e700eb5
_ZNSt11_Tuple_implILm0EJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2EEEED2Ev
I0421 21:46:28.882297 23593 leveldb.cpp:343] Persisting action (146 bytes) to
leveldb took 306713ns
I0421 21:46:28.882362 23593 replica.cpp:679] Persisted action at 1
I0421 21:46:28.882388 23593 replica.cpp:664] Replica learned APPEND action at
position 1
I0421 21:46:28.884555 23585 registrar.cpp:490] Successfully updated the
'registry' in 7.894016ms
I0421 21:46:28.884716 23585 registrar.cpp:376] Successfully recovered registrar
I0421 21:46:28.885067 23591 master.cpp:1235] Recovered 0 slaves from the
Registry (90B) ; allowing 10mins for slaves to re-register
I0421 21:46:28.885143 23595 log.cpp:703] Attempting to truncate the log to 1
I0421 21:46:28.885265 23599 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 2
I0421 21:46:28.885859 23589 replica.cpp:511] Replica received write request for
position 2
I0421 21:46:28.886220 23589 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 346129ns
I0421 21:46:28.886243 23589 replica.cpp:679] Persisted action at 2
@ 0x7feb3e700e95
_ZNSt5tupleIJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2EEEED2Ev
I0421 21:46:28.886718 23598 replica.cpp:658] Replica received learned notice
for position 2
I0421 21:46:28.887043 23598 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 286169ns
I0421 21:46:28.887096 23598 leveldb.cpp:401] Deleting ~1 keys from leveldb took
39063ns
I0421 21:46:28.887115 23598 replica.cpp:679] Persisted action at 2
I0421 21:46:28.887145 23598 replica.cpp:664] Replica learned TRUNCATE action at
position 2
@ 0x7feb3e6ffcbc std::_Bind<>::~_Bind()
@ 0x7feb3e701161 std::_Function_base::_Base_manager<>::_M_destroy()
@ 0x7feb3e701036 std::_Function_base::_Base_manager<>::_M_manager()
@ 0x42ef67 std::_Function_base::~_Function_base()
@ 0x7feb3e69c1c5 std::function<>::~function()
@ 0x7feb3e69f99e std::function<>::operator=()
@ 0x7feb3e668409 mesos::internal::slave::Slave::initialize()
@ 0x7feb3e677422 mesos::internal::slave::Slave::initialize()
@ 0x7feb3ee5879f process::ProcessManager::resume()
@ 0x7feb3ee5836e process::schedule()
@ 0x7feb3ba300a5 start_thread
@ 0x7feb3b75dcfd (unknown)
../../src/tests/script.cpp:83: Failure
Failed
test_framework_test.sh terminated with signal Aborted
[ FAILED ] ExamplesTest.TestFramework (894 ms)
{code}
> ExamplesTest.{TestFramework, NoExecutorFramework} flaky
> -------------------------------------------------------
>
> Key: MESOS-1303
> URL: https://issues.apache.org/jira/browse/MESOS-1303
> Project: Mesos
> Issue Type: Bug
> Components: test
> Reporter: Ian Downes
> Labels: flaky
>
> I'm having trouble reproducing this but I did observe it once on my OSX
> system:
> {noformat}
> [==========] Running 2 tests from 1 test case.
> [----------] Global test environment set-up.
> [----------] 2 tests from ExamplesTest
> [ RUN ] ExamplesTest.TestFramework
> ../../src/tests/script.cpp:81: Failure
> Failed
> test_framework_test.sh terminated with signal 'Abort trap: 6'
> [ FAILED ] ExamplesTest.TestFramework (953 ms)
> [ RUN ] ExamplesTest.NoExecutorFramework
> [ OK ] ExamplesTest.NoExecutorFramework (10162 ms)
> [----------] 2 tests from ExamplesTest (11115 ms total)
> [----------] Global test environment tear-down
> [==========] 2 tests from 1 test case ran. (11121 ms total)
> [ PASSED ] 1 test.
> [ FAILED ] 1 test, listed below:
> [ FAILED ] ExamplesTest.TestFramework
> {noformat}
> when investigating a failed make check for https://reviews.apache.org/r/20971/
> {noformat}
> [----------] 6 tests from ExamplesTest
> [ RUN ] ExamplesTest.TestFramework
> [ OK ] ExamplesTest.TestFramework (8643 ms)
> [ RUN ] ExamplesTest.NoExecutorFramework
> tests/script.cpp:81: Failure
> Failed
> no_executor_framework_test.sh terminated with signal 'Aborted'
> [ FAILED ] ExamplesTest.NoExecutorFramework (7220 ms)
> [ RUN ] ExamplesTest.JavaFramework
> [ OK ] ExamplesTest.JavaFramework (11181 ms)
> [ RUN ] ExamplesTest.JavaException
> [ OK ] ExamplesTest.JavaException (5624 ms)
> [ RUN ] ExamplesTest.JavaLog
> [ OK ] ExamplesTest.JavaLog (6472 ms)
> [ RUN ] ExamplesTest.PythonFramework
> [ OK ] ExamplesTest.PythonFramework (14467 ms)
> [----------] 6 tests from ExamplesTest (53607 ms total)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)