[
https://issues.apache.org/jira/browse/MESOS-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849881#comment-13849881
]
Nicholaus E Halecky edited comment on MESOS-787 at 12/17/13 5:27 AM:
---------------------------------------------------------------------
Now my apologies for the delay in reply.
So I updated HEAD to that at origin/master (commit {{9cbb81}}), and ran again
with {{MESOS_VERBOSE=1}}, captured output to file, and hit the same error via:
{noformat}
export MESOS_VERBOSE=1
make check > make.output
{noformat}
There wasn't nearly as much chatter as I would have expected, but the final
lines of output maybe contain a few more clues?:
{noformat}
make check-local
make[3]: Entering directory `/root/code/mesos/src'
./mesos-tests
Source directory: /root/code/mesos
Build directory: /root/code/mesos
Note: Google Test filter = *-
[==========] Running 288 tests from 51 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from AllocatorZooKeeperTest/0, where TypeParam =
mesos::internal::master::allocator::HierarchicalAllocatorPro
cess<mesos::internal::master::allocator::DRFSorter,
mesos::internal::master::allocator::DRFSorter>
[ RUN ] AllocatorZooKeeperTest/0.FrameworkReregistersFirst
tests/allocator_zookeeper_tests.cpp:147: Failure
Failed to wait 10secs for statu
.
./tests/isolator.hpp:192: ERROR: this mock object (used in test
AllocatorZooKeeperTest/0.FrameworkReregistersFirst) should be dele
ted but never is. Its address is @0x1eff4f0.
tests/allocator_zookeeper_tests.cpp:128: ERROR: this mock object (used in test
AllocatorZooKeeperTest/0.FrameworkReregistersFirst)
should be deleted but never is. Its address is @0x7fffa097b2a0.
tests/allocator_zookeeper_tests.cpp:136: ERROR: this mock object (used in test
AllocatorZooKeeperTest/0.FrameworkReregistersFirst)
should be deleted but never is. Its address is @0x7fffa097b810.
ERROR: 3 leaked mock objects found at program exit.
{noformat}
Let me know if this is what you were looking for and if not, what else I can do
to help further diagnose the issue.
Thanks again!
was (Author: nehalecky):
Now my apologies for the delay in reply.
So I updated HEAD to that at origin/master (commit 9cbb81), and ran again with
MESOS_VERBOSE=1, captured output to file, and hit the same error via:
```
export MESOS_VERBOSE=1
make check > make.output
```
There wasn't nearly as much chatter as I would have expected, but the final
lines of output maybe contain a few more clues?:
```
make check-local
make[3]: Entering directory `/root/code/mesos/src'
./mesos-tests
Source directory: /root/code/mesos
Build directory: /root/code/mesos
Note: Google Test filter = *-
[==========] Running 288 tests from 51 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from AllocatorZooKeeperTest/0, where TypeParam =
mesos::internal::master::allocator::HierarchicalAllocatorPro
cess<mesos::internal::master::allocator::DRFSorter,
mesos::internal::master::allocator::DRFSorter>
[ RUN ] AllocatorZooKeeperTest/0.FrameworkReregistersFirst
tests/allocator_zookeeper_tests.cpp:147: Failure
Failed to wait 10secs for statu
.
./tests/isolator.hpp:192: ERROR: this mock object (used in test
AllocatorZooKeeperTest/0.FrameworkReregistersFirst) should be dele
ted but never is. Its address is @0x1eff4f0.
tests/allocator_zookeeper_tests.cpp:128: ERROR: this mock object (used in test
AllocatorZooKeeperTest/0.FrameworkReregistersFirst)
should be deleted but never is. Its address is @0x7fffa097b2a0.
tests/allocator_zookeeper_tests.cpp:136: ERROR: this mock object (used in test
AllocatorZooKeeperTest/0.FrameworkReregistersFirst)
should be deleted but never is. Its address is @0x7fffa097b810.
ERROR: 3 leaked mock objects found at program exit.
```
Let me know if this is what you were looking for and if not, what else I can do
to help further diagnose the issue.
Thanks again!
> Authenticatee process deadlocks
> -------------------------------
>
> Key: MESOS-787
> URL: https://issues.apache.org/jira/browse/MESOS-787
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
> Assignee: Vinod Kone
> Fix For: 0.15.0
>
>
> This happened on Jenkins CI.
> [ RUN ] AllocatorTest/0.WhitelistSlave
> I1030 12:36:08.279250 26962 master.cpp:293] Master started on 127.0.0.1:42146
> I1030 12:36:08.279301 26962 master.cpp:308] Master ID:
> 201310301236-16777343-42146-26943
> I1030 12:36:08.279310 26962 master.cpp:311] Master only allowing
> authenticated frameworks to register!
> I1030 12:36:08.279672 26962 master.cpp:706] Elected as master!
> I1030 12:36:08.279724 26962 slave.cpp:109] Slave started on
> 23)@127.0.0.1:42146
> I1030 12:36:08.279839 26962 slave.cpp:209] Slave resources: cpus(*):2;
> mem(*):1024; disk(*):497; ports(*):[31000-32000]
> I1030 12:36:08.282474 26962 slave.cpp:481] New master detected at
> [email protected]:42146
> I1030 12:36:08.282510 26962 slave.cpp:496] Postponing registration until
> recovery is complete
> I1030 12:36:08.282758 26962 status_update_manager.cpp:158] New master
> detected at [email protected]:42146
> I1030 12:36:08.282785 26962 state.cpp:33] Recovering state from
> '/tmp/AllocatorTest_0_WhitelistSlave_kHuF2F/meta'
> I1030 12:36:08.282877 26962 hierarchical_allocator_process.hpp:302]
> Initializing hierarchical allocator process with master :
> [email protected]:42146
> I1030 12:36:08.282989 26962 status_update_manager.cpp:180] Recovering status
> update manager
> I1030 12:36:08.283159 26962 slave.cpp:2737] Finished recovery
> I1030 12:36:08.283895 26965 hierarchical_allocator_process.hpp:512] Updated
> slave white list: { dummy-slave }
> I1030 12:36:08.286021 26965 hierarchical_allocator_process.hpp:726] No
> resources available to allocate!
> I1030 12:36:08.286036 26965 hierarchical_allocator_process.hpp:688] Performed
> allocation for 0 slaves in 20946ns
> I1030 12:36:08.284494 26964 sched.cpp:195] New master at
> [email protected]:42146
> I1030 12:36:08.286718 26964 sched.cpp:281] Authenticating with master
> [email protected]:42146
> I1030 12:36:08.287446 26965 master.cpp:1232] Attempting to register slave on
> localhost.localdomain at slave(23)@127.0.0.1:42146
> I1030 12:36:08.287471 26965 master.cpp:2474] Adding slave
> 201310301236-16777343-42146-26943-0 at localhost.localdomain with cpus(*):2;
> mem(*):1024; disk(*):497; ports(*):[31000-32000]
> I1030 12:36:08.288630 26964 authenticatee.hpp:124] Creating new client SASL
> connection
> I1030 12:36:08.288699 26964 master.cpp:1695] Authenticating framework at
> scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.288835 26964 authenticator.hpp:140] Creating new server SASL
> connection
> I1030 12:36:08.288905 26964 authenticatee.hpp:212] Received SASL
> authentication mechanisms: CRAM-MD5
> I1030 12:36:08.288923 26964 authenticatee.hpp:238] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> I1030 12:36:08.288947 26964 authenticator.hpp:243] Received SASL
> authentication start
> I1030 12:36:08.288996 26964 authenticator.hpp:325] Authentication requires
> more steps
> I1030 12:36:08.289018 26964 authenticatee.hpp:258] Received SASL
> authentication step
> I1030 12:36:08.289049 26964 authenticator.hpp:271] Received SASL
> authentication step
> I1030 12:36:08.289068 26964 auxprop.cpp:81] Request to lookup properties for
> user: 'test-principal' realm: 'localhost.localdomain' server FQDN:
> 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID:
> false
> I1030 12:36:08.289077 26964 auxprop.cpp:153] Looking up auxiliary property
> '*userPassword'
> I1030 12:36:08.289088 26964 auxprop.cpp:153] Looking up auxiliary property
> '*cmusaslsecretCRAM-MD5'
> I1030 12:36:08.289099 26964 auxprop.cpp:81] Request to lookup properties for
> user: 'test-principal' realm: 'localhost.localdomain' server FQDN:
> 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID:
> true
> I1030 12:36:08.289106 26964 auxprop.cpp:103] Skipping auxiliary property
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1030 12:36:08.289113 26964 auxprop.cpp:103] Skipping auxiliary property
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1030 12:36:08.289124 26964 authenticator.hpp:317] Authentication success
> I1030 12:36:08.289150 26964 authenticatee.hpp:298] Authentication success
> I1030 12:36:08.289356 26963 master.cpp:1735] Successfully authenticated
> framework at scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.289441 26963 sched.cpp:326] Successfully authenticated with
> master [email protected]:42146
> I1030 12:36:08.289576 26965 master.cpp:764] Received registration request
> from scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.289649 26965 master.cpp:782] Registering framework
> 201310301236-16777343-42146-26943-0000 at scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.289751 26965 hierarchical_allocator_process.hpp:445] Added
> slave 201310301236-16777343-42146-26943-0 (localhost.localdomain) with
> cpus(*):2; mem(*):1024; disk(*):497; ports(*):[31000-32000] (and cpus(*):2;
> mem(*):1024; disk(*):497; ports(*):[31000-32000] available)
> I1030 12:36:08.289791 26965 hierarchical_allocator_process.hpp:708] Performed
> allocation for slave 201310301236-16777343-42146-26943-0 in 7625ns
> I1030 12:36:08.289846 26965 hierarchical_allocator_process.hpp:332] Added
> framework 201310301236-16777343-42146-26943-0000
> I1030 12:36:08.289883 26965 hierarchical_allocator_process.hpp:688] Performed
> allocation for 1 slaves in 24124ns
> I1030 12:36:08.289948 26963 sched.cpp:365] Framework registered with
> 201310301236-16777343-42146-26943-0000
> I1030 12:36:08.289985 26963 sched.cpp:379] Scheduler::registered took 12965ns
> I1030 12:36:08.290005 26963 master.cpp:764] Received registration request
> from scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.290017 26963 master.cpp:769] Framework
> 201310301236-16777343-42146-26943-0000 (scheduler(22)@127.0.0.1:42146)
> already registered, resending acknowledgement
> I1030 12:36:08.290047 26963 sched.cpp:360] Ignoring framework registered
> message because the driver is already connected!
> I1030 12:36:08.290124 26962 slave.cpp:547] Registered with master
> [email protected]:42146; given slave ID 201310301236-16777343-42146-26943-0
> I1030 12:36:08.290160 26962 master.cpp:1220] Slave
> 201310301236-16777343-42146-26943-0 (localhost.localdomain) already
> registered, resending acknowledgement
> **** DEADLOCK DETECTED! ****
> You are waiting on process authenticatee(22)@127.0.0.1:42146 that it is
> currently executing.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)