Hi, I'm new to Ansible and having problems
with community.general.datadog_monitor intermittently failing '500 -
Internal Server Error' errors from Datadog when creating/updating monitors.
Unfortunately, community.general.datadog_monitor doesn't support `retry`.
So, I'm looking for insight into how I can add some error handling and/or
retry logic. The following is my code and the error I'm getting:
We start off with:
- ansible-playbook -vvv -e db_type=noncdb -e @environments/prod.yml -e
resources_configured=datadog_monitors oracle-datadog.yml
Which calls ...a number of other tasks, then
- name: Manage Datadog monitors for each DB
include_tasks: library/generate-datadog-monitors.yml
when: resources_configured == "datadog_monitors"
Which calls:
- name: Create Datadog monitors for each DB
include_tasks: library/oracle-datadog-monitors.yml
loop: "{{ results }}"
loop_control:
label: Creating monitor for {{ item.ansible_facts.oracle_db_name }}
using contact {{ item.ansible_facts.slack_channel }}
vars:
application_name: "{{ item.ansible_facts.application_name }}"
contact_for_db_status: "{{ item.ansible_facts.slack_channel }}"
dashboard_link: DB ({{ item.ansible_facts.oracle_db_name }})
[dashboard](https://app.datadoghq.com/dashboard/zzz/oracle-db?tpl_var_db_name%5B0%5D={{
item.ansible_facts.oracle_db_name }})
documentation_link: https://zzz.atlassian.net}}"
slack_note: ZZZ in [#zzz](https://zzz.slack.com/zzz)
oracle_db_name: "{{ item.ansible_facts.oracle_db_name }}"
Which finally calls:
- name: "{{ oracle_db_name }} Session Limit Usage"
community.general.datadog_monitor:
api_key: "{{ lookup('env','DD_API_KEY') }}"
app_key: "{{ lookup('env','DD_APP_KEY') }}"
locked: true
name: "{{ oracle_db_name }} - Session Limit Usage {{
monitor_name_suffix }}"
no_data_timeframe: "{{ no_data_timeframe }}"
notification_message: "{{ notification_message }}"
notify_audit: false
notify_no_data: false
query: avg(last_5m):avg:oracle.session_limit_usage{db_name:{{
oracle_db_name }}} > {{ threshold_crit }}
renotify_interval: "{{ renotify_interval }}"
require_full_window: false
state: present
thresholds: { critical: "{{ threshold_crit }}", warning: "{{
threshold_warn }}" }
type: metric alert
vars:
notification_message: |
[[#is_warning]]
[[^is_renotify]]
[[value]]% of {{ oracle_db_name }}'s session limit is being consumed,
exceeding our _warn_ threshold of {{ threshold_warn }}%. {{
contact_for_db_status }}
[[/is_renotify]]
[[#is_renotify]]
[[value]]% of {{ oracle_db_name }}'s session limit is being consumed,
**continuing* to exceed our _warn_ threshold of {{ threshold_warn }}%. This
monitor was first triggered at [[first_triggered_at]] UTC. {{
contact_for_db_status }}
[[/is_renotify]]
[[/is_warning]]
[[#is_alert]]
[[^is_renotify]]
[[value]]% of {{ oracle_db_name }}'s session limit is being consumed,
exceeding our threshold of {{ threshold_crit }}%. {{ contact_for_db_status
}}
[[/is_renotify]]
[[#is_renotify]]
[[value]]% of {{ oracle_db_name }}'s session limit is being consumed,
**continuing** to exceed our threshold of {{ threshold_crit }}%. This
monitor was first triggered at [[first_triggered_at]] UTC. {{
contact_for_db_status }}
[[/is_renotify]]
[[/is_alert]]
[[#is_recovery]]
Only [[value]]% of {{ oracle_db_name }}'s session limit is being
consumed. {{ contact_for_db_status }}
[[/is_recovery]]
---
* Session Limiit Usage [documentation]({{ documentation_link
}}#Session-Limit-Usage)
* {{ dashboard_link }}
* Monitor [events from the past
year](https://app.datadoghq.com/event/explorer?query=zzz)
* {{ slack_note }}
threshold_crit: 90
threshold_warn: 75
The error I'm getting is:
TASK [db123 Session Limit Usage]
**********************************************
task path:
/builds/datadog-oracle-db-integration/ansible/library/oracle-datadog-monitors.yml:122
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo
/root/.ansible/tmp `"&& mkdir "` echo
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490 `" &&
echo ansible-tmp-1691685434.7165823-655-72718180609490="` echo
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490 `" )
&& sleep 0'
Using module file
/root/.ansible/collections/ansible_collections/community/general/plugins/modules/datadog_monitor.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-32j1rcafel/tmps6qa8vca TO
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py
&& sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/local/bin/python
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py
&& sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r
/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/ >
/dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
File
"/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py",
line 107, in <module>
_ansiballz_main()
File
"/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py",
line 99, in _ansiballz_main
invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File
"/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py",
line 47, in invoke_module
runpy.run_module(mod_name='ansible_collections.community.general.plugins.modules.datadog_monitor',
init_globals=dict(_module_fqn='ansible_collections.community.general.plugins.modules.datadog_monitor',
_modlib_path=modlib_path),
File "/usr/local/lib/python3.10/runpy.py", line 224, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/lib/python3.10/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File
"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py",
line 464, in <module>
File
"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py",
line 314, in main
File
"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py",
line 414, in install_monitor
File
"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py",
line 336, in _get_monitor
File "/usr/local/lib/python3.10/site-packages/datadog/api/monitors.py",
line 72, in get_all
return super(Monitor, cls).get_all(**params)
File "/usr/local/lib/python3.10/site-packages/datadog/api/resources.py",
line 220, in get_all
return APIClient.submit("GET", cls._resource_name, api_version,
**params)
File "/usr/local/lib/python3.10/site-packages/datadog/api/api_client.py",
line 166, in submit
result = cls._get_http_client().request(
File
"/usr/local/lib/python3.10/site-packages/datadog/api/http_client.py", line
113, in request
raise _remove_context(HTTPError(e.response.status_code, result.reason))
datadog.api.exceptions.HTTPError: Datadog returned a bad HTTP response
code: 500 - Internal Server Error. Please try again later. If the problem
persists, please contact [email protected]
fatal: [localhost]: FAILED! => {
"changed": false,
"module_stderr": "Traceback (most recent call last):\n File
\"/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py\",
line 107, in <module>\n _ansiballz_main()\n File
\"/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py\",
line 99, in _ansiballz_main\n invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\n File
\"/root/.ansible/tmp/ansible-tmp-1691685434.7165823-655-72718180609490/AnsiballZ_datadog_monitor.py\",
line 47, in invoke_module\n
runpy.run_module(mod_name='ansible_collections.community.general.plugins.modules.datadog_monitor',
init_globals=dict(_module_fqn='ansible_collections.community.general.plugins.modules.datadog_monitor',
_modlib_path=modlib_path),\n File \"/usr/local/lib/python3.10/runpy.py\",
line 224, in run_module\n return _run_module_code(code, init_globals,
run_name, mod_spec)\n File \"/usr/local/lib/python3.10/runpy.py\", line
96, in _run_module_code\n _run_code(code, mod_globals, init_globals,\n
File \"/usr/local/lib/python3.10/runpy.py\", line 86, in _run_code\n
exec(code, run_globals)\n File
\"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py\",
line 464, in <module>\n File
\"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py\",
line 314, in main\n File
\"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py\",
line 414, in install_monitor\n File
\"/tmp/ansible_community.general.datadog_monitor_payload_3u6ihbtw/ansible_community.general.datadog_monitor_payload.zip/ansible_collections/community/general/plugins/modules/datadog_monitor.py\",
line 336, in _get_monitor\n File
\"/usr/local/lib/python3.10/site-packages/datadog/api/monitors.py\", line
72, in get_all\n return super(Monitor, cls).get_all(**params)\n File
\"/usr/local/lib/python3.10/site-packages/datadog/api/resources.py\", line
220, in get_all\n return APIClient.submit(\"GET\", cls._resource_name,
api_version, **params)\n File
\"/usr/local/lib/python3.10/site-packages/datadog/api/api_client.py\", line
166, in submit\n result = cls._get_http_client().request(\n File
\"/usr/local/lib/python3.10/site-packages/datadog/api/http_client.py\",
line 113, in request\n raise
_remove_context(HTTPError(e.response.status_code,
result.reason))\ndatadog.api.exceptions.HTTPError: Datadog returned a bad
HTTP response code: 500 - Internal Server Error. Please try again later. If
the problem persists, please contact [email protected]\n",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 1
}
PLAY RECAP
*********************************************************************
localhost : ok=34 changed=0 unreachable=0 failed=1
skipped=2 rescued=0 ignored=0
$ ansible --version
ansible [core 2.15.2]
config file = None
configured module search path = ['/root/.ansible/plugins/modules',
'/usr/share/ansible/plugins/modules']
ansible python module location =
/usr/local/lib/python3.10/site-packages/ansible
ansible collection location =
/root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.10.10 (main, Mar 23 2023, 03:59:34) [GCC 10.2.1
20210110] (/usr/local/bin/python)
jinja version = 3.1.2
libyaml = True
$ ansible-galaxy collection list community.general
# /root/.ansible/collections/ansible_collections
Collection Version
----------------- -------
community.general 7.2.1
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/a4274505-8cb0-4348-9cec-e96b53fb5eafn%40googlegroups.com.