[ 
https://issues.apache.org/jira/browse/MESOS-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845054#comment-13845054
 ] 

Nicholaus E Halecky edited comment on MESOS-787 at 12/17/13 5:26 AM:
---------------------------------------------------------------------

Hey all, this is my first time posting here—I'm new to Mesos and BDAS, so 
apologies in advance if this comment is premature. :) 

On Mesos master, commit: {{473dd4fb3af51fb19c42828bcbba5ca7b2f1d54c}}

I am attempting to install Mesos on a local cluster and after bootstrapping, 
configuring and making, I test using {{make check}} and am still hitting this 
error (exact same as reported in 
[https://issues.apache.org/jira/browse/MESOS-788]):
{noformat}
Note: Google Test filter = *-
[==========] Running 287 tests from 51 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from AllocatorZooKeeperTest/0, where TypeParam = 
mesos::internal::master::allocator::HierarchicalAllocatorProcess<mesos::internal::master::allocator::DRFSorter,
 mesos::internal::master::allocator::DRFSorter>
[ RUN      ] AllocatorZooKeeperTest/0.FrameworkReregistersFirst
tests/allocator_zookeeper_tests.cpp:147: Failure
Failed to wait 10secs for status
{noformat}

I understand that this was resolved via commit 
{{8556d4ce97653b1e4cf3d0f02323abd556f5b912}}, which I show in my git log, 
however, this test still fails. What am I missing?

Thank you!


was (Author: nehalecky):
Hey all, this is my first time posting here—I'm new to Mesos and BDAS, so 
apologies in advance if this comment is premature. :) 

On Mesos master, commit: 473dd4fb3af51fb19c42828bcbba5ca7b2f1d54c

I am attempting to install Mesos on a local cluster and after bootstrapping, 
configuring and making, I test using `make check` and am still hitting this 
error (exact same as reported in 
https://issues.apache.org/jira/browse/MESOS-788):
```
Note: Google Test filter = *-
[==========] Running 287 tests from 51 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from AllocatorZooKeeperTest/0, where TypeParam = 
mesos::internal::master::allocator::HierarchicalAllocatorProcess<mesos::internal::master::allocator::DRFSorter,
 mesos::internal::master::allocator::DRFSorter>
[ RUN      ] AllocatorZooKeeperTest/0.FrameworkReregistersFirst
tests/allocator_zookeeper_tests.cpp:147: Failure
Failed to wait 10secs for status
```

I understand that this was resolved via commit 
8556d4ce97653b1e4cf3d0f02323abd556f5b912, which I show in my git log, however, 
this test still fails. What am I missing?

Thank you!

> Authenticatee process deadlocks
> -------------------------------
>
>                 Key: MESOS-787
>                 URL: https://issues.apache.org/jira/browse/MESOS-787
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Vinod Kone
>             Fix For: 0.15.0
>
>
> This happened on Jenkins CI.
> [ RUN      ] AllocatorTest/0.WhitelistSlave
> I1030 12:36:08.279250 26962 master.cpp:293] Master started on 127.0.0.1:42146
> I1030 12:36:08.279301 26962 master.cpp:308] Master ID: 
> 201310301236-16777343-42146-26943
> I1030 12:36:08.279310 26962 master.cpp:311] Master only allowing 
> authenticated frameworks to register!
> I1030 12:36:08.279672 26962 master.cpp:706] Elected as master!
> I1030 12:36:08.279724 26962 slave.cpp:109] Slave started on 
> 23)@127.0.0.1:42146
> I1030 12:36:08.279839 26962 slave.cpp:209] Slave resources: cpus(*):2; 
> mem(*):1024; disk(*):497; ports(*):[31000-32000]
> I1030 12:36:08.282474 26962 slave.cpp:481] New master detected at 
> [email protected]:42146
> I1030 12:36:08.282510 26962 slave.cpp:496] Postponing registration until 
> recovery is complete
> I1030 12:36:08.282758 26962 status_update_manager.cpp:158] New master 
> detected at [email protected]:42146
> I1030 12:36:08.282785 26962 state.cpp:33] Recovering state from 
> '/tmp/AllocatorTest_0_WhitelistSlave_kHuF2F/meta'
> I1030 12:36:08.282877 26962 hierarchical_allocator_process.hpp:302] 
> Initializing hierarchical allocator process with master : 
> [email protected]:42146
> I1030 12:36:08.282989 26962 status_update_manager.cpp:180] Recovering status 
> update manager
> I1030 12:36:08.283159 26962 slave.cpp:2737] Finished recovery
> I1030 12:36:08.283895 26965 hierarchical_allocator_process.hpp:512] Updated 
> slave white list: { dummy-slave }
> I1030 12:36:08.286021 26965 hierarchical_allocator_process.hpp:726] No 
> resources available to allocate!
> I1030 12:36:08.286036 26965 hierarchical_allocator_process.hpp:688] Performed 
> allocation for 0 slaves in 20946ns
> I1030 12:36:08.284494 26964 sched.cpp:195] New master at 
> [email protected]:42146
> I1030 12:36:08.286718 26964 sched.cpp:281] Authenticating with master 
> [email protected]:42146
> I1030 12:36:08.287446 26965 master.cpp:1232] Attempting to register slave on 
> localhost.localdomain at slave(23)@127.0.0.1:42146
> I1030 12:36:08.287471 26965 master.cpp:2474] Adding slave 
> 201310301236-16777343-42146-26943-0 at localhost.localdomain with cpus(*):2; 
> mem(*):1024; disk(*):497; ports(*):[31000-32000]
> I1030 12:36:08.288630 26964 authenticatee.hpp:124] Creating new client SASL 
> connection
> I1030 12:36:08.288699 26964 master.cpp:1695] Authenticating framework at 
> scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.288835 26964 authenticator.hpp:140] Creating new server SASL 
> connection
> I1030 12:36:08.288905 26964 authenticatee.hpp:212] Received SASL 
> authentication mechanisms: CRAM-MD5
> I1030 12:36:08.288923 26964 authenticatee.hpp:238] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I1030 12:36:08.288947 26964 authenticator.hpp:243] Received SASL 
> authentication start
> I1030 12:36:08.288996 26964 authenticator.hpp:325] Authentication requires 
> more steps
> I1030 12:36:08.289018 26964 authenticatee.hpp:258] Received SASL 
> authentication step
> I1030 12:36:08.289049 26964 authenticator.hpp:271] Received SASL 
> authentication step
> I1030 12:36:08.289068 26964 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
> 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
> false 
> I1030 12:36:08.289077 26964 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I1030 12:36:08.289088 26964 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I1030 12:36:08.289099 26964 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
> 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
> true 
> I1030 12:36:08.289106 26964 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1030 12:36:08.289113 26964 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1030 12:36:08.289124 26964 authenticator.hpp:317] Authentication success
> I1030 12:36:08.289150 26964 authenticatee.hpp:298] Authentication success
> I1030 12:36:08.289356 26963 master.cpp:1735] Successfully authenticated 
> framework at scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.289441 26963 sched.cpp:326] Successfully authenticated with 
> master [email protected]:42146
> I1030 12:36:08.289576 26965 master.cpp:764] Received registration request 
> from scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.289649 26965 master.cpp:782] Registering framework 
> 201310301236-16777343-42146-26943-0000 at scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.289751 26965 hierarchical_allocator_process.hpp:445] Added 
> slave 201310301236-16777343-42146-26943-0 (localhost.localdomain) with 
> cpus(*):2; mem(*):1024; disk(*):497; ports(*):[31000-32000] (and cpus(*):2; 
> mem(*):1024; disk(*):497; ports(*):[31000-32000] available)
> I1030 12:36:08.289791 26965 hierarchical_allocator_process.hpp:708] Performed 
> allocation for slave 201310301236-16777343-42146-26943-0 in 7625ns
> I1030 12:36:08.289846 26965 hierarchical_allocator_process.hpp:332] Added 
> framework 201310301236-16777343-42146-26943-0000
> I1030 12:36:08.289883 26965 hierarchical_allocator_process.hpp:688] Performed 
> allocation for 1 slaves in 24124ns
> I1030 12:36:08.289948 26963 sched.cpp:365] Framework registered with 
> 201310301236-16777343-42146-26943-0000
> I1030 12:36:08.289985 26963 sched.cpp:379] Scheduler::registered took 12965ns
> I1030 12:36:08.290005 26963 master.cpp:764] Received registration request 
> from scheduler(22)@127.0.0.1:42146
> I1030 12:36:08.290017 26963 master.cpp:769] Framework 
> 201310301236-16777343-42146-26943-0000 (scheduler(22)@127.0.0.1:42146) 
> already registered, resending acknowledgement
> I1030 12:36:08.290047 26963 sched.cpp:360] Ignoring framework registered 
> message because the driver is already connected!
> I1030 12:36:08.290124 26962 slave.cpp:547] Registered with master 
> [email protected]:42146; given slave ID 201310301236-16777343-42146-26943-0
> I1030 12:36:08.290160 26962 master.cpp:1220] Slave 
> 201310301236-16777343-42146-26943-0 (localhost.localdomain) already 
> registered, resending acknowledgement
> **** DEADLOCK DETECTED! ****
> You are waiting on process authenticatee(22)@127.0.0.1:42146 that it is 
> currently executing.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to