lostluck opened a new issue, #21797:
URL: https://github.com/apache/beam/issues/21797

   ### What would you like to happen?
   
   Ideally the worker binary is resilient to crashes with bundles being 
independant. Some crashes are unavoidable though, like OOMs and similar. In 
such events, it's useful to have a heapdump or core dump to examine to see what 
could be fixed or improved.
   
   The Go SDK container boot launcher should set `GOTRACEBACK=crash` which will 
cause unix based systems to write out a core dump on unexpected exits (such as 
OOMs and Similar). Further, like with java, a restarting process should try to 
find any dump written, and move it to a target location (like the job's 
designated temp folder).
   
   See https://pkg.go.dev/runtime and 
https://pkg.go.dev/runtime/debug#SetTraceback for some details.
   
   Complications:
   1. Collecting the core dump.
   Based on the test for such crash dumps, it looks like it may by default get 
written to StdErr out. 
https://cs.opensource.google/go/go/+/refs/tags/go1.18.3:src/runtime/crash_unix_test.go;l=163
   We can't simply write all std err to a file, because for long running tasks 
that would become extremely large from other debug printouts.
   This should be validated, and default std.err logging changed in this case.
   
   But this could be a misreading. Traceback information is printed (stack 
traces etc), but the heap dump might still go elsewhere. This needs to be 
determined, and in particular if docker containers are used, whether the file 
itself is accessible to the next container started by the VM.
   
   2. Moving the file
   Since the pipeline binary would have the facility to write to the job's temp 
directory, such copying code would need to live in the harness, (likely in 
harness/init.go), so it can write the file if available.
   
   Alternatively, there's possibly some configuration that could be done for 
the runner to manage this instead, but this isn't clear offhand.
   
   Java's approach is contained in 
[MemoryMonitor](https://github.com/apache/beam/blob/4ffeae4d2b800f2df36d2ea2eab549f2204d5691/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/MemoryMonitor.java)
   
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: sdk-go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to