Vinod Kone created MESOS-3275:
---------------------------------
Summary:
ContentType/HttpApiTest.UpdatePidToHttpSchedulerWithoutForce is flaky
Key: MESOS-3275
URL: https://issues.apache.org/jira/browse/MESOS-3275
Project: Mesos
Issue Type: Bug
Affects Versions: 0.24.0
Environment:
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent,OS=centos:7,label_exp=docker%7C%7CHadoop/709/consoleFull
Reporter: Vinod Kone
Observed this on ASF CI.
This looks like the HTTP scheduler received an event (heartbeat?) after the
master disallowed it to upgrade from PID to HTTP.
{code}
[ RUN ] ContentType/HttpApiTest.UpdatePidToHttpSchedulerWithoutForce/1
Using temporary directory
'/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS'
I0816 23:17:25.201861 28791 leveldb.cpp:176] Opened db in 2.031526ms
I0816 23:17:25.202529 28791 leveldb.cpp:183] Compacted db in 623251ns
I0816 23:17:25.202581 28791 leveldb.cpp:198] Created db iterator in 25756ns
I0816 23:17:25.202605 28791 leveldb.cpp:204] Seeked to beginning of db in 2310ns
I0816 23:17:25.202620 28791 leveldb.cpp:273] Iterated through 0 keys in the db
in 478ns
I0816 23:17:25.202680 28791 replica.cpp:744] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0816 23:17:25.203255 28814 recover.cpp:449] Starting replica recovery
I0816 23:17:25.203511 28814 recover.cpp:475] Replica is in EMPTY status
I0816 23:17:25.204787 28819 replica.cpp:641] Replica in EMPTY status received a
broadcasted recover request
I0816 23:17:25.205131 28823 recover.cpp:195] Received a recover response from a
replica in EMPTY status
I0816 23:17:25.205700 28825 recover.cpp:566] Updating replica status to STARTING
I0816 23:17:25.206635 28811 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 724632ns
I0816 23:17:25.206665 28811 replica.cpp:323] Persisted replica status to
STARTING
I0816 23:17:25.206836 28811 recover.cpp:475] Replica is in STARTING status
I0816 23:17:25.207716 28810 replica.cpp:641] Replica in STARTING status
received a broadcasted recover request
I0816 23:17:25.208051 28812 recover.cpp:195] Received a recover response from a
replica in STARTING status
I0816 23:17:25.209333 28817 master.cpp:378] Master
20150816-231725-3170963884-43501-28791 (f3a9451c4cf3) started on
172.17.1.189:43501
I0816 23:17:25.209509 28817 master.cpp:380] Flags at startup: --acls=""
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate="false" --authenticate_slaves="true" --authenticators="crammd5"
--authorizers="local"
--credentials="/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS/credentials"
--framework_sorter="drf" --help="false" --initialize_driver_logging="true"
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--max_slave_ping_timeouts="5" --quiet="false"
--recovery_slave_removal_limit="100%" --registry="replicated_log"
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs"
--registry_strict="true" --root_submissions="true"
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins"
--user_sorter="drf" --version="false"
--webui_dir="/mesos/mesos-0.24.0/_inst/share/mesos/webui"
--work_dir="/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS/master"
--zk_session_timeout="10secs"
I0816 23:17:25.209997 28817 master.cpp:427] Master allowing unauthenticated
frameworks to register
I0816 23:17:25.210186 28817 master.cpp:430] Master only allowing authenticated
slaves to register
I0816 23:17:25.210292 28817 credentials.hpp:37] Loading credentials for
authentication from
'/tmp/ContentType_HttpApiTest_UpdatePidToHttpSchedulerWithoutForce_1_DTcsOS/credentials'
I0816 23:17:25.210752 28817 master.cpp:469] Using default 'crammd5'
authenticator
I0816 23:17:25.210754 28825 recover.cpp:566] Updating replica status to VOTING
I0816 23:17:25.211364 28817 master.cpp:506] Authorization enabled
I0816 23:17:25.211706 28813 hierarchical.hpp:346] Initialized hierarchical
allocator process
I0816 23:17:25.211778 28811 whitelist_watcher.cpp:79] No whitelist given
I0816 23:17:25.213102 28817 master.cpp:1525] The newly elected leader is
[email protected]:43501 with id 20150816-231725-3170963884-43501-28791
I0816 23:17:25.213229 28817 master.cpp:1538] Elected as the leading master!
I0816 23:17:25.213354 28817 master.cpp:1308] Recovering from registrar
I0816 23:17:25.213632 28825 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 1.357147ms
I0816 23:17:25.213762 28825 replica.cpp:323] Persisted replica status to VOTING
I0816 23:17:25.213724 28817 registrar.cpp:311] Recovering registrar
I0816 23:17:25.213999 28825 recover.cpp:580] Successfully joined the Paxos group
I0816 23:17:25.215277 28825 recover.cpp:464] Recover process terminated
I0816 23:17:25.215857 28813 log.cpp:661] Attempting to start the writer
I0816 23:17:25.217447 28814 replica.cpp:477] Replica received implicit promise
request with proposal 1
I0816 23:17:25.218040 28814 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 424309ns
I0816 23:17:25.218194 28814 replica.cpp:345] Persisted promised to 1
I0816 23:17:25.218904 28814 coordinator.cpp:231] Coordinator attemping to fill
missing position
I0816 23:17:25.220862 28819 replica.cpp:378] Replica received explicit promise
request for position 0 with proposal 2
I0816 23:17:25.222211 28819 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 1.302196ms
I0816 23:17:25.222260 28819 replica.cpp:679] Persisted action at 0
I0816 23:17:25.223625 28823 replica.cpp:511] Replica received write request for
position 0
I0816 23:17:25.223696 28823 leveldb.cpp:438] Reading position from leveldb took
35551ns
I0816 23:17:25.224108 28823 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 358020ns
I0816 23:17:25.224138 28823 replica.cpp:679] Persisted action at 0
I0816 23:17:25.224994 28816 replica.cpp:658] Replica received learned notice
for position 0
I0816 23:17:25.225673 28816 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 363713ns
I0816 23:17:25.225883 28816 replica.cpp:679] Persisted action at 0
I0816 23:17:25.226079 28816 replica.cpp:664] Replica learned NOP action at
position 0
I0816 23:17:25.226887 28816 log.cpp:677] Writer started with ending position 0
I0816 23:17:25.228317 28810 leveldb.cpp:438] Reading position from leveldb took
40494ns
I0816 23:17:25.231591 28810 registrar.cpp:344] Successfully fetched the
registry (0B) in 17.540864ms
I0816 23:17:25.231739 28810 registrar.cpp:443] Applied 1 operations in 30654ns;
attempting to update the 'registry'
I0816 23:17:25.234084 28818 log.cpp:685] Attempting to append 178 bytes to the
log
I0816 23:17:25.234246 28815 coordinator.cpp:341] Coordinator attempting to
write APPEND action at position 1
I0816 23:17:25.235080 28821 replica.cpp:511] Replica received write request for
position 1
I0816 23:17:25.235671 28821 leveldb.cpp:343] Persisting action (197 bytes) to
leveldb took 556874ns
I0816 23:17:25.235700 28821 replica.cpp:679] Persisted action at 1
I0816 23:17:25.236559 28821 replica.cpp:658] Replica received learned notice
for position 1
I0816 23:17:25.236899 28821 leveldb.cpp:343] Persisting action (199 bytes) to
leveldb took 317171ns
I0816 23:17:25.236922 28821 replica.cpp:679] Persisted action at 1
I0816 23:17:25.236938 28821 replica.cpp:664] Replica learned APPEND action at
position 1
I0816 23:17:25.238258 28821 log.cpp:704] Attempting to truncate the log to 1
I0816 23:17:25.238286 28818 registrar.cpp:488] Successfully updated the
'registry' in 6.47808ms
I0816 23:17:25.238657 28818 registrar.cpp:374] Successfully recovered registrar
I0816 23:17:25.238965 28821 coordinator.cpp:341] Coordinator attempting to
write TRUNCATE action at position 2
I0816 23:17:25.239836 28823 replica.cpp:511] Replica received write request for
position 2
I0816 23:17:25.240257 28823 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 384807ns
I0816 23:17:25.240303 28823 replica.cpp:679] Persisted action at 2
I0816 23:17:25.241461 28823 replica.cpp:658] Replica received learned notice
for position 2
I0816 23:17:25.241971 28813 master.cpp:1335] Recovered 0 slaves from the
Registry (139B) ; allowing 10mins for slaves to re-register
I0816 23:17:25.242049 28823 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 555462ns
I0816 23:17:25.242127 28823 leveldb.cpp:401] Deleting ~1 keys from leveldb took
48784ns
I0816 23:17:25.242152 28823 replica.cpp:679] Persisted action at 2
I0816 23:17:25.242205 28823 replica.cpp:664] Replica learned TRUNCATE action at
position 2
I0816 23:17:25.257192 28791 sched.cpp:164] Version: 0.24.0
I0816 23:17:25.257940 28819 sched.cpp:262] New master detected at
[email protected]:43501
I0816 23:17:25.258066 28819 sched.cpp:318] Authenticating with master
[email protected]:43501
I0816 23:17:25.258091 28819 sched.cpp:325] Using default CRAM-MD5 authenticatee
I0816 23:17:25.258472 28819 authenticatee.cpp:115] Creating new client SASL
connection
I0816 23:17:25.258849 28819 master.cpp:4694] Authenticating
[email protected]:43501
I0816 23:17:25.259053 28819 authenticator.cpp:407] Starting authentication
session for crammd5_authenticatee(694)@172.17.1.189:43501
I0816 23:17:25.259399 28812 authenticator.cpp:92] Creating new server SASL
connection
I0816 23:17:25.259770 28819 authenticatee.cpp:206] Received SASL authentication
mechanisms: CRAM-MD5
I0816 23:17:25.259820 28819 authenticatee.cpp:232] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0816 23:17:25.259961 28819 authenticator.cpp:197] Received SASL authentication
start
I0816 23:17:25.260041 28819 authenticator.cpp:319] Authentication requires more
steps
I0816 23:17:25.260155 28819 authenticatee.cpp:252] Received SASL authentication
step
I0816 23:17:25.260319 28812 authenticator.cpp:225] Received SASL authentication
step
I0816 23:17:25.260354 28812 auxprop.cpp:102] Request to lookup properties for
user: 'test-principal' realm: 'f3a9451c4cf3' server FQDN: 'f3a9451c4cf3'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: false
I0816 23:17:25.260370 28812 auxprop.cpp:174] Looking up auxiliary property
'*userPassword'
I0816 23:17:25.260426 28812 auxprop.cpp:174] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I0816 23:17:25.260464 28812 auxprop.cpp:102] Request to lookup properties for
user: 'test-principal' realm: 'f3a9451c4cf3' server FQDN: 'f3a9451c4cf3'
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false
SASL_AUXPROP_AUTHZID: true
I0816 23:17:25.260483 28812 auxprop.cpp:124] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0816 23:17:25.260494 28812 auxprop.cpp:124] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0816 23:17:25.260515 28812 authenticator.cpp:311] Authentication success
I0816 23:17:25.260612 28819 authenticatee.cpp:292] Authentication success
I0816 23:17:25.260768 28811 authenticator.cpp:425] Authentication session
cleanup for crammd5_authenticatee(694)@172.17.1.189:43501
I0816 23:17:25.260694 28812 master.cpp:4724] Successfully authenticated
principal 'test-principal' at
[email protected]:43501
I0816 23:17:25.261353 28819 sched.cpp:407] Successfully authenticated with
master [email protected]:43501
I0816 23:17:25.261376 28819 sched.cpp:713] Sending SUBSCRIBE call to
[email protected]:43501
I0816 23:17:25.261497 28819 sched.cpp:746] Will retry registration in
1.206717027secs if necessary
I0816 23:17:25.261682 28825 master.cpp:2094] Received SUBSCRIBE call for
framework 'default' at
[email protected]:43501
I0816 23:17:25.261885 28825 master.cpp:1564] Authorizing framework principal
'test-principal' to receive offers for role '*'
I0816 23:17:25.262307 28825 master.cpp:2164] Subscribing framework default with
checkpointing disabled and capabilities [ ]
I0816 23:17:25.262744 28814 hierarchical.hpp:391] Added framework
20150816-231725-3170963884-43501-28791-0000
I0816 23:17:25.262915 28814 hierarchical.hpp:1008] No resources available to
allocate!
I0816 23:17:25.263098 28814 hierarchical.hpp:908] Performed allocation for 0
slaves in 192184ns
I0816 23:17:25.263025 28825 sched.cpp:640] Framework registered with
20150816-231725-3170963884-43501-28791-0000
I0816 23:17:25.263273 28825 sched.cpp:654] Scheduler::registered took 123101ns
I0816 23:17:25.265447 28823 process.cpp:3043] Handling HTTP event for process
'master' with path: '/master/api/v1/scheduler'
I0816 23:17:25.265732 28823 http.cpp:319] HTTP POST for
/master/api/v1/scheduler from 172.17.1.189:41579
I0816 23:17:25.266278 28823 master.cpp:1783] Received subscription request for
HTTP framework 'default'
I0816 23:17:25.266372 28823 master.cpp:1564] Authorizing framework principal
'test-principal' to receive offers for role '*'
I0816 23:17:25.266746 28822 master.cpp:1875] Subscribing framework 'default'
with checkpointing disabled and capabilities [ ]
I0816 23:17:25.266922 28822 master.cpp:1916] Updating info for framework
20150816-231725-3170963884-43501-28791-0000
E0816 23:17:25.267024 28822 master.cpp:1931] Disallowing subscription attempt
of framework 20150816-231725-3170963884-43501-28791-0000 (default) at
[email protected]:43501 because it is
already connected
I0816 23:17:25.269484 28820 hierarchical.hpp:1008] No resources available to
allocate!
I0816 23:17:25.269604 28820 hierarchical.hpp:908] Performed allocation for 0
slaves in 137770ns
../../src/tests/http_api_tests.cpp:598: Failure
Value of: event.isPending()
Actual: false
Expected: true
I0816 23:17:25.285747 28823 master.cpp:1051] Framework
20150816-231725-3170963884-43501-28791-0000 (default) at
[email protected]:43501 disconnected
I0816 23:17:25.285939 28823 master.cpp:2370] Disconnecting framework
20150816-231725-3170963884-43501-28791-0000 (default) at
[email protected]:43501
I0816 23:17:25.286100 28823 master.cpp:2394] Deactivating framework
20150816-231725-3170963884-43501-28791-0000 (default) at
[email protected]:43501
I0816 23:17:25.286303 28814 hierarchical.hpp:474] Deactivated framework
20150816-231725-3170963884-43501-28791-0000
I0816 23:17:25.286293 28823 master.cpp:1075] Giving framework
20150816-231725-3170963884-43501-28791-0000 (default) at
[email protected]:43501 2weeks to
failover
I0816 23:17:25.286866 28812 master.cpp:860] Master terminating
I0816 23:17:25.287035 28817 hierarchical.hpp:428] Removed framework
20150816-231725-3170963884-43501-28791-0000
[ FAILED ] ContentType/HttpApiTest.UpdatePidToHttpSchedulerWithoutForce/1,
where GetParam() = "application/json" (94 ms)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)