Vamsi-klu commented on code in PR #62656:
URL: https://github.com/apache/airflow/pull/62656#discussion_r2935827825
##########
shared/logging/src/airflow_shared/logging/structlog.py:
##########
@@ -224,6 +224,20 @@ def respect_stdlib_disable(logger: Any, method_name: Any,
event_dict: EventDict)
return event_dict
+def _make_safe_enc_hook(default):
+ """Wrap an enc_hook so that serialization failures fall back to
``str()``."""
+
+ def safe_enc_hook(obj):
+ if default is not None:
+ try:
+ return default(obj)
+ except TypeError:
Review Comment:
Hey @jscheffl, thank you for pointing this out. I took a closer look and
you're right, the surrogate issue needs handling at two levels.
The enc_hook now catches both TypeError and ValueError, so encoding errors
during object conversion are covered. But the trickier part was that surrogate
characters in string values bypass the enc_hook entirely since msgspec handles
strings natively and only fails when it tries to encode them to UTF-8. So I
also added a fallback around the msgspec.json.encode() call itself that
replaces surrogates with the Unicode replacement character and retries.
I've added two tests for this. One covers the ValueError path in the
enc_hook directly, and the other reproduces the surrogate scenario you
described from the DockerOperator, making sure it doesn't crash and the rest of
the log entry comes through cleanly.
All 53 structlog tests pass locally and ruff is clean. Let me know if this
looks good to you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]