[ 
https://issues.apache.org/jira/browse/BEAM-3956?focusedWorklogId=87638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87638
 ]

ASF GitHub Bot logged work on BEAM-3956:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/18 17:02
            Start Date: 04/Apr/18 17:02
    Worklog Time Spent: 10m 
      Work Description: shoyer commented on issue #4959: [BEAM-3956] Preserve 
stacktraces for Python exceptions
URL: https://github.com/apache/beam/pull/4959#issuecomment-378672888
 
 
   OK, I now understand what's going on here.
   
   In the GRPC tests, the exception from user code is caught by 
`SdkHarness._execute()`, which logs the original error/traceback and puts it 
into a `beam_fn_api_pb2.InstructionResponse` proto:
   
https://github.com/apache/beam/blob/f82acc437cc06a19241a07a5738bec4449ca01ad/sdks/python/apache_beam/runners/worker/sdk_worker.py#L117-L129
   
   Although the error message makes it into the proto, the traceback does not. 
Hence there's no way to restore it on the other side when retrieved via GRPC.
   
   I see a range of possible solutions here:
   1. Don't worry about it. The errors are logged with tracebacks on the 
worker, and users can find them there if necessary. I'll make the tests pass by 
checking stderr for the traceback, too.
   2. Append the traceback to `error` field in the 
`beam_fn_api_pb2.InstructionResponse` proto. This will ensure it ends up back 
with the caller.
   3. Add the traceback as a string to a new field on 
`beam_fn_api_pb2.InstructionResponse`. This could make sense if there's value 
in having a short error message for some use cases. We can still append the 
traceback to the error message when reraising it from the proto in 
`fn_api_runner.BundleManager.bundle_runner()`.
   4. Serialize the traceback and error in a way that can actually be restored 
remotely. Unfortunately, tracebacks can't be pickled natively, so this would 
require using something like the [tblib](https://pypi.python.org/pypi/tblib) 
library. We could also restore the actual original error as well, rather than 
re-raising it as a RuntimeError.
   
   In my assessment, (1) would be slightly reckless -- it may not be true that 
every user has access to logs on workers (I don't know if this is true with 
Cloud Dataflow, for example). (2) or (3) seems quite doable. (4) might be worth 
doing eventually, but is more trouble than it's worth.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 87638)
    Time Spent: 4h  (was: 3h 50m)

> Stacktraces from exceptions in user code should be preserved in the Python SDK
> ------------------------------------------------------------------------------
>
>                 Key: BEAM-3956
>                 URL: https://issues.apache.org/jira/browse/BEAM-3956
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Stephan Hoyer
>            Priority: Major
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, Beam's Python SDK loses stacktraces for exceptions. It does 
> helpfully add a tag like "[while running StageA]" to exception error 
> messages, but that doesn't include the stacktrace of Python functions being 
> called.
> Including the full stacktraces would make a big difference for the ease of 
> debugging Beam pipelines when things go wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to