damccorm commented on PR #34502:
URL: https://github.com/apache/beam/pull/34502#issuecomment-2770674371

   > After investigating the issue, it turns out that sometimes when a failure 
occurs due to OOM, the worker shuts down immediately and doesn't reach [the 
part of the code in 
boot.go](https://github.com/apache/beam/blob/ad7729e05041fc333ae447b5500149dffcf8336d/sdks/go/container/boot.go#L200)
 responsible for generating the dump file. Attempts to add timeouts before and 
after reading the file, preallocate additional memory in boot.go, and use 
parameters like dumpHeapOnOom and saveHeapDumpsToGcsPath didn’t help. 
Temporarily disabled this test so that The PostCommit Go Dataflow ARM and The 
PostCommit Go workflows pass successfully. Created a [separate 
issue](https://github.com/apache/beam/issues/34498) for further investigation.
   
   Thanks for looking into this - what frequency does this fail at? We can 
merge this, but I'm curious to know the impact/how often this does/doesn't work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to