damccorm commented on PR #34502: URL: https://github.com/apache/beam/pull/34502#issuecomment-2770674371
> After investigating the issue, it turns out that sometimes when a failure occurs due to OOM, the worker shuts down immediately and doesn't reach [the part of the code in boot.go](https://github.com/apache/beam/blob/ad7729e05041fc333ae447b5500149dffcf8336d/sdks/go/container/boot.go#L200) responsible for generating the dump file. Attempts to add timeouts before and after reading the file, preallocate additional memory in boot.go, and use parameters like dumpHeapOnOom and saveHeapDumpsToGcsPath didn’t help. Temporarily disabled this test so that The PostCommit Go Dataflow ARM and The PostCommit Go workflows pass successfully. Created a [separate issue](https://github.com/apache/beam/issues/34498) for further investigation. Thanks for looking into this - what frequency does this fail at? We can merge this, but I'm curious to know the impact/how often this does/doesn't work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org