lostluck opened a new issue, #21797: URL: https://github.com/apache/beam/issues/21797
### What would you like to happen? Ideally the worker binary is resilient to crashes with bundles being independant. Some crashes are unavoidable though, like OOMs and similar. In such events, it's useful to have a heapdump or core dump to examine to see what could be fixed or improved. The Go SDK container boot launcher should set `GOTRACEBACK=crash` which will cause unix based systems to write out a core dump on unexpected exits (such as OOMs and Similar). Further, like with java, a restarting process should try to find any dump written, and move it to a target location (like the job's designated temp folder). See https://pkg.go.dev/runtime and https://pkg.go.dev/runtime/debug#SetTraceback for some details. Complications: 1. Collecting the core dump. Based on the test for such crash dumps, it looks like it may by default get written to StdErr out. https://cs.opensource.google/go/go/+/refs/tags/go1.18.3:src/runtime/crash_unix_test.go;l=163 We can't simply write all std err to a file, because for long running tasks that would become extremely large from other debug printouts. This should be validated, and default std.err logging changed in this case. But this could be a misreading. Traceback information is printed (stack traces etc), but the heap dump might still go elsewhere. This needs to be determined, and in particular if docker containers are used, whether the file itself is accessible to the next container started by the VM. 2. Moving the file Since the pipeline binary would have the facility to write to the job's temp directory, such copying code would need to live in the harness, (likely in harness/init.go), so it can write the file if available. Alternatively, there's possibly some configuration that could be done for the runner to manage this instead, but this isn't clear offhand. Java's approach is contained in [MemoryMonitor](https://github.com/apache/beam/blob/4ffeae4d2b800f2df36d2ea2eab549f2204d5691/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/status/MemoryMonitor.java) ### Issue Priority Priority: 2 ### Issue Component Component: sdk-go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
