(I've posted a bit about this before, but I want to revisit it because its 
frustrating as I try to optimize my playbooks)

I have a playbook where I build servers from vmware templates using 
vmware_guest and I join the domain using that module.  Once the servers are 
built I have an extremely long "wait_for_connection":

  - name: Wait until server becomes available to connect
    wait_for_connection:
      delay: 900 #Wait 10 minutes before trying
      sleep: 30 #After 10 minutes, try every 30 seconds
      timeout: 1200 #Maximum amount of time to wait

After this wait, I start running tasks on the new hosts.  Initially, those 
tasks will run fine, but one-by-one, randomly, the servers will start 
failing with Kerberos errors.  During this time I can confirm im able to 
login to these servers using the same credentials, so the authentication 
doesn't seem to be failing outside of ansible, but it fails within ansible 
for some reason.

The longer I wait after building the servers, the less likely this issue 
occurs.  It just seems insane that I have to keep adding more wait time.  

Here's me running the playbook against 4 servers.  Each task runs against 
all four servers but the red lines highlighed show the kerberos failures 
and the eventual atrophy of the playbook entirely because of the kerberos 
errors:

TASK [Registry fix to enable solution for CVE-2017-8529 Part 1] 
****************
Monday 08 June 2020  16:32:22 +0000 (0:00:09.368)       0:33:29.081 
*********** 
changed: [server4.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
changed: [server1.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
changed: [server3.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
changed: [server2.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
TASK [Registry fix to enable solution for CVE-2017-8529 Part 2] 
****************
Monday 08 June 2020  16:32:25 +0000 (0:00:03.635)       0:33:32.717 
*********** 
changed: [server1.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
changed: [server4.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
changed: [server2.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
changed: [server3.fqdn] => {"changed": true, "data_changed": false, 
"data_type_changed": false}
TASK Configure UAC] 
*************************************************************
Monday 08 June 2020  16:32:29 +0000 (0:00:03.388)       0:33:36.105 
*********** 
fatal: [server3.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: 
the specified credentials were rejected by the server", "unreachable": true}
changed: [server1.fqdn] => {"changed": true, "data_changed": true, 
"data_type_changed": false}
changed: [server2.fqdn] => {"changed": true, "data_changed": true, 
"data_type_changed": false}
changed: [server4.fqdn] => {"changed": true, "data_changed": true, 
"data_type_changed": false}
TASK [Initialize Disk 1] 
*******************************************************
Monday 08 June 2020  16:32:32 +0000 (0:00:03.335)       0:33:39.440 
*********** 
changed: [server4.fqdn] => {"changed": true, "cmd": "Initialize-Disk 
-Number 1", "delta": "0:00:04.105311", "end": "2020-06-08 04:32:39.137372", 
"rc": 0, "start": "2020-06-08 04:32:35.032060", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}
changed: [server1.fqdn] => {"changed": true, "cmd": "Initialize-Disk 
-Number 1", "delta": "0:00:03.903042", "end": "2020-06-08 04:32:39.527549", 
"rc": 0, "start": "2020-06-08 04:32:35.624506", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}
changed: [server2.fqdn] => {"changed": true, "cmd": "Initialize-Disk 
-Number 1", "delta": "0:00:05.007749", "end": "2020-06-08 04:32:40.903429", 
"rc": 0, "start": "2020-06-08 04:32:35.895680", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}
TASK [Wait 15 seconds for disk initilization] 
**********************************
Monday 08 June 2020  16:32:41 +0000 (0:00:08.457)       0:33:47.898 
*********** 
Pausing for 15 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [server1.fqdn] => {"changed": false, "delta": 15, "echo": true, "rc": 
0, "start": "2020-06-08 16:32:41.126472", "stderr": "", "stdout": "Paused 
for 15.0 seconds", "stop": "2020-06-08 16:32:56.126843", "user_input": ""}
TASK [Partition Disk 1] 
********************************************************
Monday 08 June 2020  16:32:56 +0000 (0:00:15.051)       0:34:02.949 
*********** 
changed: [server4.fqdn] => {"changed": true}
changed: [server1.fqdn] => {"changed": true}
changed: [server2.fqdn] => {"changed": true}
TASK [Format Disk 1 as E drive] 
************************************************
Monday 08 June 2020  16:33:03 +0000 (0:00:06.888)       0:34:09.838 
*********** 
changed: [server4.fqdn] => {"changed": true}
changed: [server1.fqdn] => {"changed": true}
changed: [server2.fqdn] => {"changed": true}
TASK [Stage AV Setup Binaries to e:\admin\binaries\] ******************
Monday 08 June 2020  16:33:39 +0000 (0:00:24.463)       0:34:46.237 
*********** 
changed: [server4.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\AVAgent\\", "operation": "folder_copy", "size": 
27713762, "src": "\\\\reposerver\\Applications\\Production\\AV"}
changed: [server1.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\AVAgent\\", "operation": "folder_copy", "size": 
27713762, "src": "\\\\reposerver\\Applications\\Production\\AV"}
changed: [server2.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\AVAgent\\", "operation": "folder_copy", "size": 
27713762, "src": "\\\\reposerver\\Applications\\Production\\AV"}
TASK [Stage SecScan Setup Binaries to e:\admin\binaries\] 
***********************
Monday 08 June 2020  16:33:42 +0000 (0:00:03.402)       0:34:49.639 
*********** 
changed: [server1.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\SecScan\\64bit", "operation": "folder_copy", "size": 
23530139, "src": "\\\\reposerver\\Applications\\Production\\SecScan"}
changed: [server4.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\SecScan\\64bit", "operation": "folder_copy", "size": 
23530139, "src": "\\\\reposerver\\Applications\\Production\\SecScan"}
changed: [server2.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\SecScan\\64bit", "operation": "folder_copy", "size": 
23530139, "src": "\\\\reposerver\\Applications\\Production\\SecScan"}
TASK [Stage LAPS Setup Binaries to e:\admin\binaries\] 
*************************
Monday 08 June 2020  16:33:46 +0000 (0:00:03.674)       0:34:53.314 
*********** 
fatal: [server1.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: 
the specified credentials were rejected by the server", "unreachable": true}
changed: [server2.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\LAPSAgent\\x64", "operation": "folder_copy", "size": 
1019904, "src": "\\\\reposerver\\Applications\\Production\\Microsoft\\LAPS"}
changed: [server4.fqdn] => {"changed": true, "dest": 
"e:\\admin\\binaries\\LAPSAgent\\x64", "operation": "folder_copy", "size": 
1019904, "src": "\\\\reposerver\\Applications\\Production\\Microsoft\\LAPS"}
TASK [Ensure LAPS is installed] 
************************************************
Monday 08 June 2020  16:33:49 +0000 (0:00:03.291)       0:34:56.606 
*********** 
changed: [server4.fqdn] => {"changed": true, "rc": 0, "reboot_required": 
false}
changed: [server2.fqdn] => {"changed": true, "rc": 0, "reboot_required": 
false}
TASK [Ensure Agent is installed] 
**********************************************
Monday 08 June 2020  16:33:54 +0000 (0:00:04.571)       0:35:01.177 
*********** 
fatal: [server2.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: 
the specified credentials were rejected by the server", "unreachable": true}
changed: [server4.fqdn] => {"changed": true, "rc": 0, "reboot_required": 
false}
TASK [Ensure Agent is installed] 
************************************************
Monday 08 June 2020  16:34:03 +0000 (0:00:09.009)       0:35:10.187 
*********** 
changed: [server4.fqdn] => {"changed": true, "rc": 0, "reboot_required": 
false}
TASK [Ensure AV is installed] ******************************************
Monday 08 June 2020  16:34:08 +0000 (0:00:04.973)       0:35:15.161 
*********** 
fatal: [server4.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: 
the specified credentials were rejected by the server", "unreachable": true}


I'm a bit new to the Linux world, is it possible this is a bug within 
something on the linux node I run ansible/ansible tower off of? I initially 
thought it was something with AD replication, but I can authenticate fine 
against these servers within minutes of them being added to the domain 
through normal windows/microsoft processes.

Thanks in advance for any advice!

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/9498b991-c5ef-4084-9c1d-eec8f1485d0ao%40googlegroups.com.

Reply via email to