[
https://issues.apache.org/jira/browse/MESOS-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam B updated MESOS-2050:
--------------------------
Assignee: Till Toenshoff (was: Adam B)
> InMemoryAuxProp plugin used by Authenticators results in SEGFAULT
> -----------------------------------------------------------------
>
> Key: MESOS-2050
> URL: https://issues.apache.org/jira/browse/MESOS-2050
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.21.0
> Reporter: Vinod Kone
> Assignee: Till Toenshoff
>
> Observed this on ASF CI:
> Basically, as part of the recent Auth refactor for modules, the loading of
> secrets is being done once per Authenticator Process instead of once in the
> Master. Since, InMemoryAuxProp plugin manipulates static variables (e.g,
> 'properties') it results in SEGFAULT when one Authenticator (e.g., for slave)
> does load() while another Authenticator (e.g., for framework) does lookup(),
> as both these methods manipulate static 'properties'.
> {code}
> [ RUN ] MasterTest.LaunchDuplicateOfferTest
> Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp'
> I1104 03:37:55.523553 28363 leveldb.cpp:176] Opened db in 2.270387ms
> I1104 03:37:55.524250 28363 leveldb.cpp:183] Compacted db in 662527ns
> I1104 03:37:55.524276 28363 leveldb.cpp:198] Created db iterator in 4964ns
> I1104 03:37:55.524284 28363 leveldb.cpp:204] Seeked to beginning of db in
> 702ns
> I1104 03:37:55.524291 28363 leveldb.cpp:273] Iterated through 0 keys in the
> db in 450ns
> I1104 03:37:55.524333 28363 replica.cpp:741] Replica recovered with log
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1104 03:37:55.524852 28384 recover.cpp:437] Starting replica recovery
> I1104 03:37:55.525188 28384 recover.cpp:463] Replica is in EMPTY status
> I1104 03:37:55.526577 28378 replica.cpp:638] Replica in EMPTY status received
> a broadcasted recover request
> I1104 03:37:55.527135 28378 master.cpp:318] Master
> 20141104-033755-3176252227-49988-28363 (proserpina.apache.org) started on
> 67.195.81.189:49988
> I1104 03:37:55.527180 28378 master.cpp:364] Master only allowing
> authenticated frameworks to register
> I1104 03:37:55.527191 28378 master.cpp:369] Master only allowing
> authenticated slaves to register
> I1104 03:37:55.527217 28378 credentials.hpp:36] Loading credentials for
> authentication from
> '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp/credentials'
> I1104 03:37:55.527451 28378 master.cpp:408] Authorization enabled
> I1104 03:37:55.528081 28384 master.cpp:126] No whitelist given. Advertising
> offers for all slaves
> I1104 03:37:55.528548 28383 recover.cpp:188] Received a recover response from
> a replica in EMPTY status
> I1104 03:37:55.528645 28388 hierarchical_allocator_process.hpp:299]
> Initializing hierarchical allocator process with master :
> [email protected]:49988
> I1104 03:37:55.529233 28388 master.cpp:1258] The newly elected leader is
> [email protected]:49988 with id 20141104-033755-3176252227-49988-28363
> I1104 03:37:55.529266 28388 master.cpp:1271] Elected as the leading master!
> I1104 03:37:55.529289 28388 master.cpp:1089] Recovering from registrar
> I1104 03:37:55.529311 28385 recover.cpp:554] Updating replica status to
> STARTING
> I1104 03:37:55.529500 28384 registrar.cpp:313] Recovering registrar
> I1104 03:37:55.530037 28383 leveldb.cpp:306] Persisting metadata (8 bytes) to
> leveldb took 497965ns
> I1104 03:37:55.530083 28383 replica.cpp:320] Persisted replica status to
> STARTING
> I1104 03:37:55.530335 28387 recover.cpp:463] Replica is in STARTING status
> I1104 03:37:55.531343 28381 replica.cpp:638] Replica in STARTING status
> received a broadcasted recover request
> I1104 03:37:55.531739 28384 recover.cpp:188] Received a recover response from
> a replica in STARTING status
> I1104 03:37:55.532168 28379 recover.cpp:554] Updating replica status to VOTING
> I1104 03:37:55.532572 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to
> leveldb took 293974ns
> I1104 03:37:55.532594 28381 replica.cpp:320] Persisted replica status to
> VOTING
> I1104 03:37:55.532790 28390 recover.cpp:568] Successfully joined the Paxos
> group
> I1104 03:37:55.533107 28390 recover.cpp:452] Recover process terminated
> I1104 03:37:55.533604 28382 log.cpp:656] Attempting to start the writer
> I1104 03:37:55.534840 28381 replica.cpp:474] Replica received implicit
> promise request with proposal 1
> I1104 03:37:55.535188 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to
> leveldb took 321021ns
> I1104 03:37:55.535212 28381 replica.cpp:342] Persisted promised to 1
> I1104 03:37:55.535893 28378 coordinator.cpp:230] Coordinator attemping to
> fill missing position
> I1104 03:37:55.537318 28392 replica.cpp:375] Replica received explicit
> promise request for position 0 with proposal 2
> I1104 03:37:55.537719 28392 leveldb.cpp:343] Persisting action (8 bytes) to
> leveldb took 366858ns
> I1104 03:37:55.537746 28392 replica.cpp:676] Persisted action at 0
> I1104 03:37:55.538872 28391 replica.cpp:508] Replica received write request
> for position 0
> I1104 03:37:55.538952 28391 leveldb.cpp:438] Reading position from leveldb
> took 36647ns
> I1104 03:37:55.539482 28391 leveldb.cpp:343] Persisting action (14 bytes) to
> leveldb took 483191ns
> I1104 03:37:55.539510 28391 replica.cpp:676] Persisted action at 0
> I1104 03:37:55.540045 28379 replica.cpp:655] Replica received learned notice
> for position 0
> I1104 03:37:55.540402 28379 leveldb.cpp:343] Persisting action (16 bytes) to
> leveldb took 299913ns
> I1104 03:37:55.540429 28379 replica.cpp:676] Persisted action at 0
> I1104 03:37:55.540455 28379 replica.cpp:661] Replica learned NOP action at
> position 0
> I1104 03:37:55.541205 28385 log.cpp:672] Writer started with ending position 0
> I1104 03:37:55.542579 28378 leveldb.cpp:438] Reading position from leveldb
> took 33358ns
> I1104 03:37:55.545608 28377 registrar.cpp:346] Successfully fetched the
> registry (0B) in 16.064768ms
> I1104 03:37:55.545778 28377 registrar.cpp:445] Applied 1 operations in
> 31511ns; attempting to update the 'registry'
> I1104 03:37:55.548598 28382 log.cpp:680] Attempting to append 143 bytes to
> the log
> I1104 03:37:55.548821 28385 coordinator.cpp:340] Coordinator attempting to
> write APPEND action at position 1
> I1104 03:37:55.549759 28385 replica.cpp:508] Replica received write request
> for position 1
> I1104 03:37:55.550269 28385 leveldb.cpp:343] Persisting action (162 bytes) to
> leveldb took 475014ns
> I1104 03:37:55.550300 28385 replica.cpp:676] Persisted action at 1
> I1104 03:37:55.550984 28382 replica.cpp:655] Replica received learned notice
> for position 1
> I1104 03:37:55.551311 28382 leveldb.cpp:343] Persisting action (164 bytes) to
> leveldb took 304432ns
> I1104 03:37:55.551332 28382 replica.cpp:676] Persisted action at 1
> I1104 03:37:55.551349 28382 replica.cpp:661] Replica learned APPEND action at
> position 1
> I1104 03:37:55.552196 28392 registrar.cpp:490] Successfully updated the
> 'registry' in 6.35904ms
> I1104 03:37:55.552338 28392 registrar.cpp:376] Successfully recovered
> registrar
> I1104 03:37:55.552423 28385 log.cpp:699] Attempting to truncate the log to 1
> I1104 03:37:55.552577 28382 master.cpp:1116] Recovered 0 slaves from the
> Registry (105B) ; allowing 10mins for slaves to re-register
> I1104 03:37:55.552647 28378 coordinator.cpp:340] Coordinator attempting to
> write TRUNCATE action at position 2
> I1104 03:37:55.553383 28391 replica.cpp:508] Replica received write request
> for position 2
> I1104 03:37:55.553772 28391 leveldb.cpp:343] Persisting action (16 bytes) to
> leveldb took 357114ns
> I1104 03:37:55.553797 28391 replica.cpp:676] Persisted action at 2
> I1104 03:37:55.554234 28379 replica.cpp:655] Replica received learned notice
> for position 2
> I1104 03:37:55.554529 28379 leveldb.cpp:343] Persisting action (18 bytes) to
> leveldb took 273827ns
> I1104 03:37:55.554565 28379 leveldb.cpp:401] Deleting ~1 keys from leveldb
> took 16455ns
> I1104 03:37:55.554580 28379 replica.cpp:676] Persisted action at 2
> I1104 03:37:55.554596 28379 replica.cpp:661] Replica learned TRUNCATE action
> at position 2
> I1104 03:37:55.566700 28385 slave.cpp:169] Slave started on
> 2)@67.195.81.189:49988
> I1104 03:37:55.566766 28385 credentials.hpp:84] Loading credential for
> authentication from
> '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/credential'
> I1104 03:37:55.566946 28385 slave.cpp:276] Slave using credential for:
> test-principal
> I1104 03:37:55.567327 28385 slave.cpp:289] Slave resources: cpus(*):2;
> mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000]
> I1104 03:37:55.567436 28385 slave.cpp:318] Slave hostname:
> proserpina.apache.org
> I1104 03:37:55.567458 28385 slave.cpp:319] Slave checkpoint: false
> W1104 03:37:55.567471 28385 slave.cpp:321] Disabling checkpointing is
> deprecated and the --checkpoint flag will be removed in a future release.
> Please avoid using this flag
> I1104 03:37:55.568488 28392 state.cpp:33] Recovering state from
> '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/meta'
> I1104 03:37:55.568739 28380 status_update_manager.cpp:197] Recovering status
> update manager
> I1104 03:37:55.569128 28387 slave.cpp:3478] Finished recovery
> I1104 03:37:55.569718 28388 status_update_manager.cpp:171] Pausing sending
> status updates
> I1104 03:37:55.569773 28384 slave.cpp:602] New master detected at
> [email protected]:49988
> I1104 03:37:55.569820 28384 slave.cpp:665] Authenticating with master
> [email protected]:49988
> I1104 03:37:55.570000 28386 authenticatee.hpp:133] Creating new client SASL
> connection
> I1104 03:37:55.570137 28384 slave.cpp:638] Detecting new master
> I1104 03:37:55.570248 28385 master.cpp:3869] Authenticating
> slave(2)@67.195.81.189:49988
> I1104 03:37:55.570291 28385 master.cpp:3880] Using default CRAM-MD5
> authenticator
> I1104 03:37:55.570592 28377 authenticator.hpp:169] Creating new server SASL
> connection
> I1104 03:37:55.570808 28377 authenticatee.hpp:224] Received SASL
> authentication mechanisms: CRAM-MD5
> I1104 03:37:55.570826 28363 sched.cpp:137] Version: 0.21.0
> I1104 03:37:55.570837 28377 authenticatee.hpp:250] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I1104 03:37:55.571281 28386 authenticator.hpp:275] Received SASL
> authentication start
> I1104 03:37:55.571321 28386 authenticator.hpp:397] Authentication requires
> more steps
> I1104 03:37:55.571599 28391 sched.cpp:233] New master detected at
> [email protected]:49988
> I1104 03:37:55.571635 28391 sched.cpp:283] Authenticating with master
> [email protected]:49988
> I1104 03:37:55.571771 28378 authenticatee.hpp:133] Creating new client SASL
> connection
> I1104 03:37:55.571974 28377 authenticatee.hpp:270] Received SASL
> authentication step
> I1104 03:37:55.572016 28380 master.cpp:3869] Authenticating
> [email protected]:49988
> I1104 03:37:55.572082 28377 authenticator.hpp:303] Received SASL
> authentication step
> I1104 03:37:55.572103 28380 master.cpp:3880] Using default CRAM-MD5
> authenticator
> I1104 03:37:55.572108 28377 auxprop.cpp:99] Request to lookup properties for
> user: 'test-principal' realm: 'proserpina.apache.org' server FQDN:
> 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
> I1104 03:37:55.572119 28377 auxprop.cpp:171] Looking up auxiliary property
> '*userPassword'
> *** Aborted at 1415072275 (unix time) try "date -d @1415072275" if you are
> using GNU date ***
> I1104 03:37:55.572304 28387 authenticator.hpp:169] Creating new server SASL
> connection
> PC: @ 0x2aae9cc945cb (unknown)
> I1104 03:37:55.572639 28380 authenticatee.hpp:224] Received SASL
> authentication mechanisms: CRAM-MD5
> I1104 03:37:55.572667 28380 authenticatee.hpp:250] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I1104 03:37:55.572756 28391 authenticator.hpp:275] Received SASL
> authentication start
> I1104 03:37:55.572796 28391 authenticator.hpp:397] Authentication requires
> more steps
> I1104 03:37:55.572860 28391 authenticatee.hpp:270] Received SASL
> authentication step
> I1104 03:37:55.572939 28387 authenticator.hpp:303] Received SASL
> authentication step
> I1104 03:37:55.572958 28387 auxprop.cpp:99] Request to lookup properties for
> user: 'test-principal' realm: 'proserpina.apache.org' server FQDN:
> 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
> I1104 03:37:55.572967 28387 auxprop.cpp:171] Looking up auxiliary property
> '*userPassword'
> I1104 03:37:55.572999 28387 auxprop.cpp:171] Looking up auxiliary property
> '*cmusaslsecretCRAM-MD5'
> I1104 03:37:55.573015 28387 auxprop.cpp:99] Request to lookup properties for
> user: 'test-principal' realm: 'proserpina.apache.org' server FQDN:
> 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
> I1104 03:37:55.573027 28387 auxprop.cpp:121] Skipping auxiliary property
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1104 03:37:55.573034 28387 auxprop.cpp:121] Skipping auxiliary property
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1104 03:37:55.573045 28387 authenticator.hpp:389] Authentication success
> I1104 03:37:55.573132 28390 authenticatee.hpp:310] Authentication success
> I1104 03:37:55.573168 28388 master.cpp:3930] Successfully authenticated
> principal 'test-principal' at
> [email protected]:49988
> I1104 03:37:55.573325 28390 sched.cpp:357] Successfully authenticated with
> master [email protected]:49988
> I1104 03:37:55.573348 28390 sched.cpp:476] Sending registration request to
> [email protected]:49988
> I1104 03:37:55.573606 28388 master.cpp:1378] Received registration request
> for framework 'default' at
> [email protected]:49988
> I1104 03:37:55.573689 28388 master.cpp:1337] Authorizing framework principal
> 'test-principal' to receive offers for role '*'
> I1104 03:37:55.574188 28388 master.cpp:1442] Registering framework
> 20141104-033755-3176252227-49988-28363-0000 (default) at
> [email protected]:49988
> I1104 03:37:55.574515 28386 hierarchical_allocator_process.hpp:329] Added
> framework 20141104-033755-3176252227-49988-28363-0000
> I1104 03:37:55.574547 28386 hierarchical_allocator_process.hpp:697] No
> resources available to allocate!
> I1104 03:37:55.574560 28386 hierarchical_allocator_process.hpp:659] Performed
> allocation for 0 slaves in 19409ns
> I1104 03:37:55.574631 28388 sched.cpp:407] Framework registered with
> 20141104-033755-3176252227-49988-28363-0000
> I1104 03:37:55.574676 28388 sched.cpp:421] Scheduler::registered took 20198ns
> *** SIGSEGV (@0xfffffffffffffff8) received by PID 28363 (TID 0x2aae9f399700)
> from PID 18446744073709551608; stack trace: ***
> @ 0x2aae9d409340 (unknown)
> @ 0x2aae9cc945cb (unknown)
> @ 0x98e4f0 _ZNSt10_List_nodeISsEC2IJRKSsEEEDpOT_
> @ 0x9788c2
> _ZN9__gnu_cxx13new_allocatorISt10_List_nodeISsEE9constructIS2_JRKSsEEEvPT_DpOT0_
> @ 0x956696 std::list<>::_M_create_node<>()
> @ 0x9376a4 std::list<>::_M_insert<>()
> @ 0x978829 _ZNSt4listISsSaISsEE12emplace_backIJRKSsEEEvDpOT_
> @ 0x956534 std::list<>::_M_initialize_dispatch<>()
> @ 0x9374e1 std::list<>::list()
> @ 0xb0f80f mesos::internal::cram_md5::Property::Property()
> @ 0x2aae9a6779f8 std::_List_node<>::_List_node<>()
> @ 0x2aae9a67664e
> _ZN9__gnu_cxx13new_allocatorISt10_List_nodeIN5mesos8internal8cram_md58PropertyEEE9constructIS6_JRKS5_EEEvPT_DpOT0_
> @ 0x2aae9a674876
> _ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE14_M_create_nodeIJRKS3_EEEPSt10_List_nodeIS3_EDpOT_
> @ 0x2aae9a6721c6
> _ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE9_M_insertIJRKS3_EEEvSt14_List_iteratorIS3_EDpOT_
> @ 0x2aae9a670326 std::list<>::push_back()
> @ 0x2aae9a66e704 Multimap<>::get()
> @ 0x2aae9a66b561
> mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup()
> @ 0x2aae9a66aa3d
> mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup()
> @ 0x2aae9c7b9e29 (unknown)
> @ 0x2aae9c7babe5 (unknown)
> @ 0x2aaea2b8252e (unknown)
> @ 0x2aaea2b82929 (unknown)
> @ 0x2aae9c7c6628 (unknown)
> @ 0xb0c482
> mesos::internal::cram_md5::CRAMMD5AuthenticatorProcess::step()
> @ 0xb135e9 ProtobufProcess<>::_handler1<>()
> @ 0xb25229
> _ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEE6__callIvJSF_S6_EJLm0ELm1ELm2ELm3ELm4EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0xb2078c
> _ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEEclIJSF_S6_EvEET0_DpOT_
> @ 0xb1c33b std::_Function_handler<>::_M_invoke()
> @ 0xb2bcc5 std::function<>::operator()()
> @ 0xb2ba8e ProtobufProcess<>::visit()
> @ 0x2aae9a84120a process::MessageEvent::visit()
> @ 0x904766 process::ProcessBase::serve()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)