chamikaramj commented on code in PR #31284:
URL: https://github.com/apache/beam/pull/31284#discussion_r1600786019
##########
sdks/python/apache_beam/transforms/external.py:
##########
@@ -750,7 +751,7 @@ def expand(self, pvalueish):
with ExternalTransform.service(expansion_service) as service:
response = service.Expand(request)
if response.error:
- raise RuntimeError(response.error)
+ raise RuntimeError(_sanitize_java_traceback(response.error))
Review Comment:
For transform service, we combine the errors from all expansion service
container and return them together. Usually one of them is useful while other
is a trivial error (for example, URN not available). Seems like this might
result in useful Python errors being deleted in that case ?
https://github.com/apache/beam/blob/54db453b032bc3e4827fd8d59aa600cba9dd111d/sdks/java/transform-service/src/main/java/org/apache/beam/sdk/transformservice/ExpansionService.java#L159
##########
sdks/python/apache_beam/transforms/external.py:
##########
@@ -1142,6 +1143,32 @@ def is_docker_available():
'Docker executables are available in the system.')
+def _sanitize_java_traceback(s):
+ """Attempts to highlight the root cause in the error string.
+
+ Java tracebacks read bottom to top, while Python tracebacks read top to
+ bottom, resulting in the actual error message getting sandwiched between two
+ walls of text. This may result in the error being duplicated (as we don't
+ want to remove relevant information) but should be clearer in most cases.
+
+ Best-effort but non-destructive.
+ """
+ traceback_lines = [
+ r'\tat \S+\(\S+\.java:\d+\)',
+ r'\t\.\.\. \d+ more',
+ # A bit more restrictive to avoid accidentally capturing non-java lines.
+ r'Caused by: [a-z]+(\.\S+)?\.[A-Z][A-Za-z0-9_$]+(Error|Exception):[^\n]*'
+ ]
+ without_java_traceback = s + '\n'
+ for p in traceback_lines:
+ without_java_traceback = re.sub(
+ fr'\n{p}$', '', without_java_traceback, flags=re.M)
Review Comment:
IIUC, this sets every line that matches the pattern within the stack trace
to an empty string. Is that correct ?
(probably expand documentation to explain the logic here)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]