Vinod Kone created MESOS-2050:
---------------------------------
Summary: InMemoryAuxProp plugin used by Authenticators results in
SEGFAULT
Key: MESOS-2050
URL: https://issues.apache.org/jira/browse/MESOS-2050
Project: Mesos
Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Vinod Kone
Observed this on ASF CI:
Basically, as part of the recent Auth refactor for modules, the loading of
secrets is being done once per Authenticator Process instead of once in the
Master. Since, InMemoryAuxProp plugin manipulates static variables (e.g,
'properties') it results in SEGFAULT when one Authenticator (e.g., for slave)
does load() while another Authenticator (e.g., for framework) does lookup(), as
both these methods manipulate static 'properties'.
{code}
[ RUN ] MasterTest.LaunchDuplicateOfferTest
Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp'
I1104 03:37:55.523553 28363 leveldb.cpp:176] Opened db in 2.270387ms
I1104 03:37:55.524250 28363 leveldb.cpp:183] Compacted db in 662527ns
I1104 03:37:55.524276 28363 leveldb.cpp:198] Created db iterator in 4964ns
I1104 03:37:55.524284 28363 leveldb.cpp:204] Seeked to beginning of db in 702ns
I1104 03:37:55.524291 28363 leveldb.cpp:273] Iterated through 0 keys in the db
in 450ns
I1104 03:37:55.524333 28363 replica.cpp:741] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I1104 03:37:55.524852 28384 recover.cpp:437] Starting replica recovery
I1104 03:37:55.525188 28384 recover.cpp:463] Replica is in EMPTY status
I1104 03:37:55.526577 28378 replica.cpp:638] Replica in EMPTY status received a
broadcasted recover request
I1104 03:37:55.527135 28378 master.cpp:318] Master
20141104-033755-3176252227-49988-28363 (proserpina.apache.org) started on
67.195.81.189:49988
I1104 03:37:55.527180 28378 master.cpp:364] Master only allowing authenticated
frameworks to register
I1104 03:37:55.527191 28378 master.cpp:369] Master only allowing authenticated
slaves to register
I1104 03:37:55.527217 28378 credentials.hpp:36] Loading credentials for
authentication from
'/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp/credentials'
I1104 03:37:55.527451 28378 master.cpp:408] Authorization enabled
I1104 03:37:55.528081 28384 master.cpp:126] No whitelist given. Advertising
offers for all slaves
I1104 03:37:55.528548 28383 recover.cpp:188] Received a recover response from a
replica in EMPTY status
I1104 03:37:55.528645 28388 hierarchical_allocator_process.hpp:299]
Initializing hierarchical allocator process with master :
[email protected]:49988
I1104 03:37:55.529233 28388 master.cpp:1258] The newly elected leader is
[email protected]:49988 with id 20141104-033755-3176252227-49988-28363
I1104 03:37:55.529266 28388 master.cpp:1271] Elected as the leading master!
I1104 03:37:55.529289 28388 master.cpp:1089] Recovering from registrar
I1104 03:37:55.529311 28385 recover.cpp:554] Updating replica status to STARTING
I1104 03:37:55.529500 28384 registrar.cpp:313] Recovering registrar
I1104 03:37:55.530037 28383 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 497965ns
I1104 03:37:55.530083 28383 replica.cpp:320] Persisted replica status to
STARTING
I1104 03:37:55.530335 28387 recover.cpp:463] Replica is in STARTING status
I1104 03:37:55.531343 28381 replica.cpp:638] Replica in STARTING status
received a broadcasted recover request
I1104 03:37:55.531739 28384 recover.cpp:188] Received a recover response from a
replica in STARTING status
I1104 03:37:55.532168 28379 recover.cpp:554] Updating replica status to VOTING
I1104 03:37:55.532572 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 293974ns
I1104 03:37:55.532594 28381 replica.cpp:320] Persisted replica status to VOTING
I1104 03:37:55.532790 28390 recover.cpp:568] Successfully joined the Paxos group
I1104 03:37:55.533107 28390 recover.cpp:452] Recover process terminated
I1104 03:37:55.533604 28382 log.cpp:656] Attempting to start the writer
I1104 03:37:55.534840 28381 replica.cpp:474] Replica received implicit promise
request with proposal 1
I1104 03:37:55.535188 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to
leveldb took 321021ns
I1104 03:37:55.535212 28381 replica.cpp:342] Persisted promised to 1
I1104 03:37:55.535893 28378 coordinator.cpp:230] Coordinator attemping to fill
missing position
I1104 03:37:55.537318 28392 replica.cpp:375] Replica received explicit promise
request for position 0 with proposal 2
I1104 03:37:55.537719 28392 leveldb.cpp:343] Persisting action (8 bytes) to
leveldb took 366858ns
I1104 03:37:55.537746 28392 replica.cpp:676] Persisted action at 0
I1104 03:37:55.538872 28391 replica.cpp:508] Replica received write request for
position 0
I1104 03:37:55.538952 28391 leveldb.cpp:438] Reading position from leveldb took
36647ns
I1104 03:37:55.539482 28391 leveldb.cpp:343] Persisting action (14 bytes) to
leveldb took 483191ns
I1104 03:37:55.539510 28391 replica.cpp:676] Persisted action at 0
I1104 03:37:55.540045 28379 replica.cpp:655] Replica received learned notice
for position 0
I1104 03:37:55.540402 28379 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 299913ns
I1104 03:37:55.540429 28379 replica.cpp:676] Persisted action at 0
I1104 03:37:55.540455 28379 replica.cpp:661] Replica learned NOP action at
position 0
I1104 03:37:55.541205 28385 log.cpp:672] Writer started with ending position 0
I1104 03:37:55.542579 28378 leveldb.cpp:438] Reading position from leveldb took
33358ns
I1104 03:37:55.545608 28377 registrar.cpp:346] Successfully fetched the
registry (0B) in 16.064768ms
I1104 03:37:55.545778 28377 registrar.cpp:445] Applied 1 operations in 31511ns;
attempting to update the 'registry'
I1104 03:37:55.548598 28382 log.cpp:680] Attempting to append 143 bytes to the
log
I1104 03:37:55.548821 28385 coordinator.cpp:340] Coordinator attempting to
write APPEND action at position 1
I1104 03:37:55.549759 28385 replica.cpp:508] Replica received write request for
position 1
I1104 03:37:55.550269 28385 leveldb.cpp:343] Persisting action (162 bytes) to
leveldb took 475014ns
I1104 03:37:55.550300 28385 replica.cpp:676] Persisted action at 1
I1104 03:37:55.550984 28382 replica.cpp:655] Replica received learned notice
for position 1
I1104 03:37:55.551311 28382 leveldb.cpp:343] Persisting action (164 bytes) to
leveldb took 304432ns
I1104 03:37:55.551332 28382 replica.cpp:676] Persisted action at 1
I1104 03:37:55.551349 28382 replica.cpp:661] Replica learned APPEND action at
position 1
I1104 03:37:55.552196 28392 registrar.cpp:490] Successfully updated the
'registry' in 6.35904ms
I1104 03:37:55.552338 28392 registrar.cpp:376] Successfully recovered registrar
I1104 03:37:55.552423 28385 log.cpp:699] Attempting to truncate the log to 1
I1104 03:37:55.552577 28382 master.cpp:1116] Recovered 0 slaves from the
Registry (105B) ; allowing 10mins for slaves to re-register
I1104 03:37:55.552647 28378 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 2
I1104 03:37:55.553383 28391 replica.cpp:508] Replica received write request for
position 2
I1104 03:37:55.553772 28391 leveldb.cpp:343] Persisting action (16 bytes) to
leveldb took 357114ns
I1104 03:37:55.553797 28391 replica.cpp:676] Persisted action at 2
I1104 03:37:55.554234 28379 replica.cpp:655] Replica received learned notice
for position 2
I1104 03:37:55.554529 28379 leveldb.cpp:343] Persisting action (18 bytes) to
leveldb took 273827ns
I1104 03:37:55.554565 28379 leveldb.cpp:401] Deleting ~1 keys from leveldb took
16455ns
I1104 03:37:55.554580 28379 replica.cpp:676] Persisted action at 2
I1104 03:37:55.554596 28379 replica.cpp:661] Replica learned TRUNCATE action at
position 2
I1104 03:37:55.566700 28385 slave.cpp:169] Slave started on
2)@67.195.81.189:49988
I1104 03:37:55.566766 28385 credentials.hpp:84] Loading credential for
authentication from '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/credential'
I1104 03:37:55.566946 28385 slave.cpp:276] Slave using credential for:
test-principal
I1104 03:37:55.567327 28385 slave.cpp:289] Slave resources: cpus(*):2;
mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000]
I1104 03:37:55.567436 28385 slave.cpp:318] Slave hostname: proserpina.apache.org
I1104 03:37:55.567458 28385 slave.cpp:319] Slave checkpoint: false
W1104 03:37:55.567471 28385 slave.cpp:321] Disabling checkpointing is
deprecated and the --checkpoint flag will be removed in a future release.
Please avoid using this flag
I1104 03:37:55.568488 28392 state.cpp:33] Recovering state from
'/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/meta'
I1104 03:37:55.568739 28380 status_update_manager.cpp:197] Recovering status
update manager
I1104 03:37:55.569128 28387 slave.cpp:3478] Finished recovery
I1104 03:37:55.569718 28388 status_update_manager.cpp:171] Pausing sending
status updates
I1104 03:37:55.569773 28384 slave.cpp:602] New master detected at
[email protected]:49988
I1104 03:37:55.569820 28384 slave.cpp:665] Authenticating with master
[email protected]:49988
I1104 03:37:55.570000 28386 authenticatee.hpp:133] Creating new client SASL
connection
I1104 03:37:55.570137 28384 slave.cpp:638] Detecting new master
I1104 03:37:55.570248 28385 master.cpp:3869] Authenticating
slave(2)@67.195.81.189:49988
I1104 03:37:55.570291 28385 master.cpp:3880] Using default CRAM-MD5
authenticator
I1104 03:37:55.570592 28377 authenticator.hpp:169] Creating new server SASL
connection
I1104 03:37:55.570808 28377 authenticatee.hpp:224] Received SASL authentication
mechanisms: CRAM-MD5
I1104 03:37:55.570826 28363 sched.cpp:137] Version: 0.21.0
I1104 03:37:55.570837 28377 authenticatee.hpp:250] Attempting to authenticate
with mechanism 'CRAM-MD5'
I1104 03:37:55.571281 28386 authenticator.hpp:275] Received SASL authentication
start
I1104 03:37:55.571321 28386 authenticator.hpp:397] Authentication requires more
steps
I1104 03:37:55.571599 28391 sched.cpp:233] New master detected at
[email protected]:49988
I1104 03:37:55.571635 28391 sched.cpp:283] Authenticating with master
[email protected]:49988
I1104 03:37:55.571771 28378 authenticatee.hpp:133] Creating new client SASL
connection
I1104 03:37:55.571974 28377 authenticatee.hpp:270] Received SASL authentication
step
I1104 03:37:55.572016 28380 master.cpp:3869] Authenticating
[email protected]:49988
I1104 03:37:55.572082 28377 authenticator.hpp:303] Received SASL authentication
step
I1104 03:37:55.572103 28380 master.cpp:3880] Using default CRAM-MD5
authenticator
I1104 03:37:55.572108 28377 auxprop.cpp:99] Request to lookup properties for
user: 'test-principal' realm: 'proserpina.apache.org' server FQDN:
'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
I1104 03:37:55.572119 28377 auxprop.cpp:171] Looking up auxiliary property
'*userPassword'
*** Aborted at 1415072275 (unix time) try "date -d @1415072275" if you are
using GNU date ***
I1104 03:37:55.572304 28387 authenticator.hpp:169] Creating new server SASL
connection
PC: @ 0x2aae9cc945cb (unknown)
I1104 03:37:55.572639 28380 authenticatee.hpp:224] Received SASL authentication
mechanisms: CRAM-MD5
I1104 03:37:55.572667 28380 authenticatee.hpp:250] Attempting to authenticate
with mechanism 'CRAM-MD5'
I1104 03:37:55.572756 28391 authenticator.hpp:275] Received SASL authentication
start
I1104 03:37:55.572796 28391 authenticator.hpp:397] Authentication requires more
steps
I1104 03:37:55.572860 28391 authenticatee.hpp:270] Received SASL authentication
step
I1104 03:37:55.572939 28387 authenticator.hpp:303] Received SASL authentication
step
I1104 03:37:55.572958 28387 auxprop.cpp:99] Request to lookup properties for
user: 'test-principal' realm: 'proserpina.apache.org' server FQDN:
'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
I1104 03:37:55.572967 28387 auxprop.cpp:171] Looking up auxiliary property
'*userPassword'
I1104 03:37:55.572999 28387 auxprop.cpp:171] Looking up auxiliary property
'*cmusaslsecretCRAM-MD5'
I1104 03:37:55.573015 28387 auxprop.cpp:99] Request to lookup properties for
user: 'test-principal' realm: 'proserpina.apache.org' server FQDN:
'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
I1104 03:37:55.573027 28387 auxprop.cpp:121] Skipping auxiliary property
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I1104 03:37:55.573034 28387 auxprop.cpp:121] Skipping auxiliary property
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I1104 03:37:55.573045 28387 authenticator.hpp:389] Authentication success
I1104 03:37:55.573132 28390 authenticatee.hpp:310] Authentication success
I1104 03:37:55.573168 28388 master.cpp:3930] Successfully authenticated
principal 'test-principal' at
[email protected]:49988
I1104 03:37:55.573325 28390 sched.cpp:357] Successfully authenticated with
master [email protected]:49988
I1104 03:37:55.573348 28390 sched.cpp:476] Sending registration request to
[email protected]:49988
I1104 03:37:55.573606 28388 master.cpp:1378] Received registration request for
framework 'default' at
[email protected]:49988
I1104 03:37:55.573689 28388 master.cpp:1337] Authorizing framework principal
'test-principal' to receive offers for role '*'
I1104 03:37:55.574188 28388 master.cpp:1442] Registering framework
20141104-033755-3176252227-49988-28363-0000 (default) at
[email protected]:49988
I1104 03:37:55.574515 28386 hierarchical_allocator_process.hpp:329] Added
framework 20141104-033755-3176252227-49988-28363-0000
I1104 03:37:55.574547 28386 hierarchical_allocator_process.hpp:697] No
resources available to allocate!
I1104 03:37:55.574560 28386 hierarchical_allocator_process.hpp:659] Performed
allocation for 0 slaves in 19409ns
I1104 03:37:55.574631 28388 sched.cpp:407] Framework registered with
20141104-033755-3176252227-49988-28363-0000
I1104 03:37:55.574676 28388 sched.cpp:421] Scheduler::registered took 20198ns
*** SIGSEGV (@0xfffffffffffffff8) received by PID 28363 (TID 0x2aae9f399700)
from PID 18446744073709551608; stack trace: ***
@ 0x2aae9d409340 (unknown)
@ 0x2aae9cc945cb (unknown)
@ 0x98e4f0 _ZNSt10_List_nodeISsEC2IJRKSsEEEDpOT_
@ 0x9788c2
_ZN9__gnu_cxx13new_allocatorISt10_List_nodeISsEE9constructIS2_JRKSsEEEvPT_DpOT0_
@ 0x956696 std::list<>::_M_create_node<>()
@ 0x9376a4 std::list<>::_M_insert<>()
@ 0x978829 _ZNSt4listISsSaISsEE12emplace_backIJRKSsEEEvDpOT_
@ 0x956534 std::list<>::_M_initialize_dispatch<>()
@ 0x9374e1 std::list<>::list()
@ 0xb0f80f mesos::internal::cram_md5::Property::Property()
@ 0x2aae9a6779f8 std::_List_node<>::_List_node<>()
@ 0x2aae9a67664e
_ZN9__gnu_cxx13new_allocatorISt10_List_nodeIN5mesos8internal8cram_md58PropertyEEE9constructIS6_JRKS5_EEEvPT_DpOT0_
@ 0x2aae9a674876
_ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE14_M_create_nodeIJRKS3_EEEPSt10_List_nodeIS3_EDpOT_
@ 0x2aae9a6721c6
_ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE9_M_insertIJRKS3_EEEvSt14_List_iteratorIS3_EDpOT_
@ 0x2aae9a670326 std::list<>::push_back()
@ 0x2aae9a66e704 Multimap<>::get()
@ 0x2aae9a66b561
mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup()
@ 0x2aae9a66aa3d
mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup()
@ 0x2aae9c7b9e29 (unknown)
@ 0x2aae9c7babe5 (unknown)
@ 0x2aaea2b8252e (unknown)
@ 0x2aaea2b82929 (unknown)
@ 0x2aae9c7c6628 (unknown)
@ 0xb0c482
mesos::internal::cram_md5::CRAMMD5AuthenticatorProcess::step()
@ 0xb135e9 ProtobufProcess<>::_handler1<>()
@ 0xb25229
_ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEE6__callIvJSF_S6_EJLm0ELm1ELm2ELm3ELm4EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
@ 0xb2078c
_ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEEclIJSF_S6_EvEET0_DpOT_
@ 0xb1c33b std::_Function_handler<>::_M_invoke()
@ 0xb2bcc5 std::function<>::operator()()
@ 0xb2ba8e ProtobufProcess<>::visit()
@ 0x2aae9a84120a process::MessageEvent::visit()
@ 0x904766 process::ProcessBase::serve()
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)