zendesk-kjaanson commented on issue #32743: URL: https://github.com/apache/beam/issues/32743#issuecomment-2582822746
I solved the issue for myself, not sure how relevant it is to the issue at hand here. In my case, when trying to use PortableRunner with flink using Apache Flink Operator, the staging volume was not accessible/same for job manager and task managers/workers. For some reason this causes empty files for `submission_environment_dependencies.txt` (and if you use `save_main_session` then also for `pickled_main_session` to appear in `/tmp/staged` which then results in the `failed to retrieve staged files: failed to retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for /tmp/staged/submission_environment_dependencies.txt` error to appear when worker process tries to load these. My issue was solved when I was able to create a **working** shared staging volume across pods. _Fun side note that might be helpful for someone: When you try to create host mounted PersistantVolume with ReadWriteMany access mode on Googles GKE and use it as a volume then it never actually tells you that you can't do it, but will simply mount random (different) volumes across all pods. Docs mention that it is not supposed to be supported :D. I went with FUSE CSI driver that solved the issue for GKE._ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
