[ https://issues.apache.org/jira/browse/MESOS-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203060#comment-14203060 ]
Till Toenshoff commented on MESOS-2050: --------------------------------------- commit ce82e81c12d7f91b158c73465c47725331626f32 Author: Till Toenshoff <toensh...@me.com> Date: Sat Nov 8 01:19:50 2014 +0100 Fixed Authenticator SASL auxiliary memory access. Review: https://reviews.apache.org/r/27741 > InMemoryAuxProp plugin used by Authenticators results in SEGFAULT > ----------------------------------------------------------------- > > Key: MESOS-2050 > URL: https://issues.apache.org/jira/browse/MESOS-2050 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.21.0 > Reporter: Vinod Kone > Assignee: Till Toenshoff > > Observed this on ASF CI: > Basically, as part of the recent Auth refactor for modules, the loading of > secrets is being done once per Authenticator Process instead of once in the > Master. Since, InMemoryAuxProp plugin manipulates static variables (e.g, > 'properties') it results in SEGFAULT when one Authenticator (e.g., for slave) > does load() while another Authenticator (e.g., for framework) does lookup(), > as both these methods manipulate static 'properties'. > {code} > [ RUN ] MasterTest.LaunchDuplicateOfferTest > Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp' > I1104 03:37:55.523553 28363 leveldb.cpp:176] Opened db in 2.270387ms > I1104 03:37:55.524250 28363 leveldb.cpp:183] Compacted db in 662527ns > I1104 03:37:55.524276 28363 leveldb.cpp:198] Created db iterator in 4964ns > I1104 03:37:55.524284 28363 leveldb.cpp:204] Seeked to beginning of db in > 702ns > I1104 03:37:55.524291 28363 leveldb.cpp:273] Iterated through 0 keys in the > db in 450ns > I1104 03:37:55.524333 28363 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1104 03:37:55.524852 28384 recover.cpp:437] Starting replica recovery > I1104 03:37:55.525188 28384 recover.cpp:463] Replica is in EMPTY status > I1104 03:37:55.526577 28378 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I1104 03:37:55.527135 28378 master.cpp:318] Master > 20141104-033755-3176252227-49988-28363 (proserpina.apache.org) started on > 67.195.81.189:49988 > I1104 03:37:55.527180 28378 master.cpp:364] Master only allowing > authenticated frameworks to register > I1104 03:37:55.527191 28378 master.cpp:369] Master only allowing > authenticated slaves to register > I1104 03:37:55.527217 28378 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp/credentials' > I1104 03:37:55.527451 28378 master.cpp:408] Authorization enabled > I1104 03:37:55.528081 28384 master.cpp:126] No whitelist given. Advertising > offers for all slaves > I1104 03:37:55.528548 28383 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I1104 03:37:55.528645 28388 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.189:49988 > I1104 03:37:55.529233 28388 master.cpp:1258] The newly elected leader is > master@67.195.81.189:49988 with id 20141104-033755-3176252227-49988-28363 > I1104 03:37:55.529266 28388 master.cpp:1271] Elected as the leading master! > I1104 03:37:55.529289 28388 master.cpp:1089] Recovering from registrar > I1104 03:37:55.529311 28385 recover.cpp:554] Updating replica status to > STARTING > I1104 03:37:55.529500 28384 registrar.cpp:313] Recovering registrar > I1104 03:37:55.530037 28383 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 497965ns > I1104 03:37:55.530083 28383 replica.cpp:320] Persisted replica status to > STARTING > I1104 03:37:55.530335 28387 recover.cpp:463] Replica is in STARTING status > I1104 03:37:55.531343 28381 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I1104 03:37:55.531739 28384 recover.cpp:188] Received a recover response from > a replica in STARTING status > I1104 03:37:55.532168 28379 recover.cpp:554] Updating replica status to VOTING > I1104 03:37:55.532572 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 293974ns > I1104 03:37:55.532594 28381 replica.cpp:320] Persisted replica status to > VOTING > I1104 03:37:55.532790 28390 recover.cpp:568] Successfully joined the Paxos > group > I1104 03:37:55.533107 28390 recover.cpp:452] Recover process terminated > I1104 03:37:55.533604 28382 log.cpp:656] Attempting to start the writer > I1104 03:37:55.534840 28381 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I1104 03:37:55.535188 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 321021ns > I1104 03:37:55.535212 28381 replica.cpp:342] Persisted promised to 1 > I1104 03:37:55.535893 28378 coordinator.cpp:230] Coordinator attemping to > fill missing position > I1104 03:37:55.537318 28392 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I1104 03:37:55.537719 28392 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 366858ns > I1104 03:37:55.537746 28392 replica.cpp:676] Persisted action at 0 > I1104 03:37:55.538872 28391 replica.cpp:508] Replica received write request > for position 0 > I1104 03:37:55.538952 28391 leveldb.cpp:438] Reading position from leveldb > took 36647ns > I1104 03:37:55.539482 28391 leveldb.cpp:343] Persisting action (14 bytes) to > leveldb took 483191ns > I1104 03:37:55.539510 28391 replica.cpp:676] Persisted action at 0 > I1104 03:37:55.540045 28379 replica.cpp:655] Replica received learned notice > for position 0 > I1104 03:37:55.540402 28379 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 299913ns > I1104 03:37:55.540429 28379 replica.cpp:676] Persisted action at 0 > I1104 03:37:55.540455 28379 replica.cpp:661] Replica learned NOP action at > position 0 > I1104 03:37:55.541205 28385 log.cpp:672] Writer started with ending position 0 > I1104 03:37:55.542579 28378 leveldb.cpp:438] Reading position from leveldb > took 33358ns > I1104 03:37:55.545608 28377 registrar.cpp:346] Successfully fetched the > registry (0B) in 16.064768ms > I1104 03:37:55.545778 28377 registrar.cpp:445] Applied 1 operations in > 31511ns; attempting to update the 'registry' > I1104 03:37:55.548598 28382 log.cpp:680] Attempting to append 143 bytes to > the log > I1104 03:37:55.548821 28385 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 1 > I1104 03:37:55.549759 28385 replica.cpp:508] Replica received write request > for position 1 > I1104 03:37:55.550269 28385 leveldb.cpp:343] Persisting action (162 bytes) to > leveldb took 475014ns > I1104 03:37:55.550300 28385 replica.cpp:676] Persisted action at 1 > I1104 03:37:55.550984 28382 replica.cpp:655] Replica received learned notice > for position 1 > I1104 03:37:55.551311 28382 leveldb.cpp:343] Persisting action (164 bytes) to > leveldb took 304432ns > I1104 03:37:55.551332 28382 replica.cpp:676] Persisted action at 1 > I1104 03:37:55.551349 28382 replica.cpp:661] Replica learned APPEND action at > position 1 > I1104 03:37:55.552196 28392 registrar.cpp:490] Successfully updated the > 'registry' in 6.35904ms > I1104 03:37:55.552338 28392 registrar.cpp:376] Successfully recovered > registrar > I1104 03:37:55.552423 28385 log.cpp:699] Attempting to truncate the log to 1 > I1104 03:37:55.552577 28382 master.cpp:1116] Recovered 0 slaves from the > Registry (105B) ; allowing 10mins for slaves to re-register > I1104 03:37:55.552647 28378 coordinator.cpp:340] Coordinator attempting to > write TRUNCATE action at position 2 > I1104 03:37:55.553383 28391 replica.cpp:508] Replica received write request > for position 2 > I1104 03:37:55.553772 28391 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 357114ns > I1104 03:37:55.553797 28391 replica.cpp:676] Persisted action at 2 > I1104 03:37:55.554234 28379 replica.cpp:655] Replica received learned notice > for position 2 > I1104 03:37:55.554529 28379 leveldb.cpp:343] Persisting action (18 bytes) to > leveldb took 273827ns > I1104 03:37:55.554565 28379 leveldb.cpp:401] Deleting ~1 keys from leveldb > took 16455ns > I1104 03:37:55.554580 28379 replica.cpp:676] Persisted action at 2 > I1104 03:37:55.554596 28379 replica.cpp:661] Replica learned TRUNCATE action > at position 2 > I1104 03:37:55.566700 28385 slave.cpp:169] Slave started on > 2)@67.195.81.189:49988 > I1104 03:37:55.566766 28385 credentials.hpp:84] Loading credential for > authentication from > '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/credential' > I1104 03:37:55.566946 28385 slave.cpp:276] Slave using credential for: > test-principal > I1104 03:37:55.567327 28385 slave.cpp:289] Slave resources: cpus(*):2; > mem(*):1024; disk(*):3.70122e+06; ports(*):[31000-32000] > I1104 03:37:55.567436 28385 slave.cpp:318] Slave hostname: > proserpina.apache.org > I1104 03:37:55.567458 28385 slave.cpp:319] Slave checkpoint: false > W1104 03:37:55.567471 28385 slave.cpp:321] Disabling checkpointing is > deprecated and the --checkpoint flag will be removed in a future release. > Please avoid using this flag > I1104 03:37:55.568488 28392 state.cpp:33] Recovering state from > '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/meta' > I1104 03:37:55.568739 28380 status_update_manager.cpp:197] Recovering status > update manager > I1104 03:37:55.569128 28387 slave.cpp:3478] Finished recovery > I1104 03:37:55.569718 28388 status_update_manager.cpp:171] Pausing sending > status updates > I1104 03:37:55.569773 28384 slave.cpp:602] New master detected at > master@67.195.81.189:49988 > I1104 03:37:55.569820 28384 slave.cpp:665] Authenticating with master > master@67.195.81.189:49988 > I1104 03:37:55.570000 28386 authenticatee.hpp:133] Creating new client SASL > connection > I1104 03:37:55.570137 28384 slave.cpp:638] Detecting new master > I1104 03:37:55.570248 28385 master.cpp:3869] Authenticating > slave(2)@67.195.81.189:49988 > I1104 03:37:55.570291 28385 master.cpp:3880] Using default CRAM-MD5 > authenticator > I1104 03:37:55.570592 28377 authenticator.hpp:169] Creating new server SASL > connection > I1104 03:37:55.570808 28377 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 03:37:55.570826 28363 sched.cpp:137] Version: 0.21.0 > I1104 03:37:55.570837 28377 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > I1104 03:37:55.571281 28386 authenticator.hpp:275] Received SASL > authentication start > I1104 03:37:55.571321 28386 authenticator.hpp:397] Authentication requires > more steps > I1104 03:37:55.571599 28391 sched.cpp:233] New master detected at > master@67.195.81.189:49988 > I1104 03:37:55.571635 28391 sched.cpp:283] Authenticating with master > master@67.195.81.189:49988 > I1104 03:37:55.571771 28378 authenticatee.hpp:133] Creating new client SASL > connection > I1104 03:37:55.571974 28377 authenticatee.hpp:270] Received SASL > authentication step > I1104 03:37:55.572016 28380 master.cpp:3869] Authenticating > scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988 > I1104 03:37:55.572082 28377 authenticator.hpp:303] Received SASL > authentication step > I1104 03:37:55.572103 28380 master.cpp:3880] Using default CRAM-MD5 > authenticator > I1104 03:37:55.572108 28377 auxprop.cpp:99] Request to lookup properties for > user: 'test-principal' realm: 'proserpina.apache.org' server FQDN: > 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false > I1104 03:37:55.572119 28377 auxprop.cpp:171] Looking up auxiliary property > '*userPassword' > *** Aborted at 1415072275 (unix time) try "date -d @1415072275" if you are > using GNU date *** > I1104 03:37:55.572304 28387 authenticator.hpp:169] Creating new server SASL > connection > PC: @ 0x2aae9cc945cb (unknown) > I1104 03:37:55.572639 28380 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 03:37:55.572667 28380 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > I1104 03:37:55.572756 28391 authenticator.hpp:275] Received SASL > authentication start > I1104 03:37:55.572796 28391 authenticator.hpp:397] Authentication requires > more steps > I1104 03:37:55.572860 28391 authenticatee.hpp:270] Received SASL > authentication step > I1104 03:37:55.572939 28387 authenticator.hpp:303] Received SASL > authentication step > I1104 03:37:55.572958 28387 auxprop.cpp:99] Request to lookup properties for > user: 'test-principal' realm: 'proserpina.apache.org' server FQDN: > 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false > I1104 03:37:55.572967 28387 auxprop.cpp:171] Looking up auxiliary property > '*userPassword' > I1104 03:37:55.572999 28387 auxprop.cpp:171] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I1104 03:37:55.573015 28387 auxprop.cpp:99] Request to lookup properties for > user: 'test-principal' realm: 'proserpina.apache.org' server FQDN: > 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true > I1104 03:37:55.573027 28387 auxprop.cpp:121] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I1104 03:37:55.573034 28387 auxprop.cpp:121] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I1104 03:37:55.573045 28387 authenticator.hpp:389] Authentication success > I1104 03:37:55.573132 28390 authenticatee.hpp:310] Authentication success > I1104 03:37:55.573168 28388 master.cpp:3930] Successfully authenticated > principal 'test-principal' at > scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988 > I1104 03:37:55.573325 28390 sched.cpp:357] Successfully authenticated with > master master@67.195.81.189:49988 > I1104 03:37:55.573348 28390 sched.cpp:476] Sending registration request to > master@67.195.81.189:49988 > I1104 03:37:55.573606 28388 master.cpp:1378] Received registration request > for framework 'default' at > scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988 > I1104 03:37:55.573689 28388 master.cpp:1337] Authorizing framework principal > 'test-principal' to receive offers for role '*' > I1104 03:37:55.574188 28388 master.cpp:1442] Registering framework > 20141104-033755-3176252227-49988-28363-0000 (default) at > scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988 > I1104 03:37:55.574515 28386 hierarchical_allocator_process.hpp:329] Added > framework 20141104-033755-3176252227-49988-28363-0000 > I1104 03:37:55.574547 28386 hierarchical_allocator_process.hpp:697] No > resources available to allocate! > I1104 03:37:55.574560 28386 hierarchical_allocator_process.hpp:659] Performed > allocation for 0 slaves in 19409ns > I1104 03:37:55.574631 28388 sched.cpp:407] Framework registered with > 20141104-033755-3176252227-49988-28363-0000 > I1104 03:37:55.574676 28388 sched.cpp:421] Scheduler::registered took 20198ns > *** SIGSEGV (@0xfffffffffffffff8) received by PID 28363 (TID 0x2aae9f399700) > from PID 18446744073709551608; stack trace: *** > @ 0x2aae9d409340 (unknown) > @ 0x2aae9cc945cb (unknown) > @ 0x98e4f0 _ZNSt10_List_nodeISsEC2IJRKSsEEEDpOT_ > @ 0x9788c2 > _ZN9__gnu_cxx13new_allocatorISt10_List_nodeISsEE9constructIS2_JRKSsEEEvPT_DpOT0_ > @ 0x956696 std::list<>::_M_create_node<>() > @ 0x9376a4 std::list<>::_M_insert<>() > @ 0x978829 _ZNSt4listISsSaISsEE12emplace_backIJRKSsEEEvDpOT_ > @ 0x956534 std::list<>::_M_initialize_dispatch<>() > @ 0x9374e1 std::list<>::list() > @ 0xb0f80f mesos::internal::cram_md5::Property::Property() > @ 0x2aae9a6779f8 std::_List_node<>::_List_node<>() > @ 0x2aae9a67664e > _ZN9__gnu_cxx13new_allocatorISt10_List_nodeIN5mesos8internal8cram_md58PropertyEEE9constructIS6_JRKS5_EEEvPT_DpOT0_ > @ 0x2aae9a674876 > _ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE14_M_create_nodeIJRKS3_EEEPSt10_List_nodeIS3_EDpOT_ > @ 0x2aae9a6721c6 > _ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE9_M_insertIJRKS3_EEEvSt14_List_iteratorIS3_EDpOT_ > @ 0x2aae9a670326 std::list<>::push_back() > @ 0x2aae9a66e704 Multimap<>::get() > @ 0x2aae9a66b561 > mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup() > @ 0x2aae9a66aa3d > mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup() > @ 0x2aae9c7b9e29 (unknown) > @ 0x2aae9c7babe5 (unknown) > @ 0x2aaea2b8252e (unknown) > @ 0x2aaea2b82929 (unknown) > @ 0x2aae9c7c6628 (unknown) > @ 0xb0c482 > mesos::internal::cram_md5::CRAMMD5AuthenticatorProcess::step() > @ 0xb135e9 ProtobufProcess<>::_handler1<>() > @ 0xb25229 > _ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEE6__callIvJSF_S6_EJLm0ELm1ELm2ELm3ELm4EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0xb2078c > _ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEEclIJSF_S6_EvEET0_DpOT_ > @ 0xb1c33b std::_Function_handler<>::_M_invoke() > @ 0xb2bcc5 std::function<>::operator()() > @ 0xb2ba8e ProtobufProcess<>::visit() > @ 0x2aae9a84120a process::MessageEvent::visit() > @ 0x904766 process::ProcessBase::serve() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)