aminghadersohi commented on code in PR #41177:
URL: https://github.com/apache/superset/pull/41177#discussion_r3477885585
##########
superset/reports/notifications/webhook.py:
##########
@@ -128,7 +128,21 @@ def _validate_webhook_url(self, url: str) -> None:
raise NotificationParamException("Webhook URL target host is not
allowed.")
@backoff.on_exception(
- backoff.expo, NotificationUnprocessableException, factor=10, base=2,
max_tries=5
+ backoff.expo,
+ NotificationUnprocessableException,
+ factor=10,
+ base=2,
+ max_tries=5,
+ # Bound total wall-clock retry time. Without this, a hanging or
+ # persistently failing target can stall a worker for minutes per bad
URL
+ # (up to ~5 socket waits at timeout=60 plus ~150s of retry sleeps),
+ # starving sequential report dispatch. backoff evaluates max_time
against
+ # the elapsed time sampled BEFORE each attempt, so the last request can
+ # start just under the bound and still run its full timeout: worst case
+ # ~210s (~150s elapsed at the final pre-attempt check + one 60s
request),
Review Comment:
NIT: the comment says backoff samples elapsed "BEFORE each attempt" and
gives "~210s (≈150s elapsed at the final pre-attempt check + one 60s request)".
`backoff._sync` actually samples elapsed *after* each attempt fails and before
sleeping — not before the next attempt starts. The arithmetic is also
internally inconsistent: 150s elapsed > `max_time=120` would already trigger
giveup without starting any further request. The actual ceiling
(just-under-120s accumulated + one final 60s request) is closer to ~180s, which
is consistent with the PR body's own wording. The production behaviour is
correct; the comment's framing of *when* the check fires and the resulting
estimate are misleading.
##########
tests/unit_tests/reports/notifications/webhook_tests.py:
##########
@@ -269,3 +271,156 @@ class MockResponse:
with pytest.raises(NotificationParamException, match="redirect"):
webhook_notification.send()
+
+
+def _make_webhook(mock_header_data) -> WebhookNotification:
+ from superset.reports.models import ReportRecipients, ReportRecipientType
+ from superset.reports.notifications.base import NotificationContent
+
+ content = NotificationContent(
+ name="test alert", header_data=mock_header_data, description="Test
description"
+ )
+ return WebhookNotification(
+ recipient=ReportRecipients(
+ type=ReportRecipientType.WEBHOOK,
+ recipient_config_json='{"target": "https://example.com/webhook"}',
+ ),
+ content=content,
+ )
+
+
+class _MockServerErrorResponse:
+ status_code = 500
+ text = ""
+
+
+def _allow_internal_app() -> type:
+ class MockCurrentApp:
+ config = {
+ "ALERT_REPORTS_WEBHOOK_HTTPS_ONLY": True,
+ "ALERT_REPORTS_WEBHOOK_ALLOW_INTERNAL_HOSTS": True,
+ }
+
+ return MockCurrentApp
+
+
+def test_send_backoff_bounded_by_max_time(monkeypatch, mock_header_data) ->
None:
+ """
+ A persistently failing (500) target gives up on wall-time (``max_time``),
+ not just ``max_tries``. ``freeze_time(auto_tick_seconds=30)`` advances the
+ clock on every time access (backoff measures elapsed via ``datetime.now``,
+ which freezegun freezes and ticks), so cumulative elapsed crosses
``max_time=120``
+ before ``max_tries=5`` is exhausted. We assert the discriminating
*property*
+ — gave up on wall-time strictly before exhausting ``max_tries`` — rather
than
+ a pinned count, because freezegun's per-call tick count is an opaque
+ implementation detail. If ``max_time`` is removed this rises to the full 5
+ (RED); the flat-clock companion test below anchors that 5 is the ceiling.
+ The terminal exception type is unchanged on giveup.
+ """
+ webhook_notification = _make_webhook(mock_header_data)
+ post_calls: list[int] = []
+
+ def fake_post(*args, **kwargs) -> _MockServerErrorResponse:
+ post_calls.append(1)
+ return _MockServerErrorResponse()
+
+ monkeypatch.setattr(
+ "superset.reports.notifications.webhook.current_app",
_allow_internal_app()
+ )
+ monkeypatch.setattr(
+
"superset.reports.notifications.webhook.feature_flag_manager.is_feature_enabled",
+ lambda flag: True,
+ )
+ monkeypatch.setattr(
+ "superset.reports.notifications.webhook.requests.post", fake_post
+ )
+ monkeypatch.setattr("backoff._sync.time.sleep", lambda *a, **k: None)
Review Comment:
MEDIUM: all three new tests patch `backoff._sync.time.sleep` (this line,
372, 416) — a private module attribute of `backoff`, the same coupling class as
the `_FakeBackoffDatetime` pattern an earlier reviewer flagged (now addressed
for clock-faking via freezegun). Unlike the old datetime approach,
`monkeypatch.setattr` here would fail loudly with `AttributeError` if backoff
reorganises, so it is not a silent-pass risk — but it still binds the tests to
backoff's internal module layout.
Suggest patching at the stdlib level instead, consistent with the freezegun
approach now used for clock-faking:
```suggestion
monkeypatch.setattr(time.sleep, lambda *a, **kw: None)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]