mosche commented on issue #23932: URL: https://github.com/apache/beam/issues/23932#issuecomment-1308560292
> The following keeps executing and does not stop ... @nitinlkoin1984 That sounds like kubernetes restarting the job over and over again. Could you check that pls. > 2022/11/01 22:13:16 Failed to obtain provisioning information: failed to dial server at localhost:43225 The error you are seeing in the SDK container logs give hint to the core of the problem. The SDK harness expects to talk to the Spark worker on `localhost`. This is **by design**, if communication would involve expensive network traffic performance would be unacceptable. This is also an issue when testing the SDK harness with Spark on Docker on a Mac (or Windows). Docker doesn't support host networking on these platforms, hence communication between containers using `localhost` isn't possible. There's two environment options to allow that for testing purposes, see https://github.com/apache/beam/issues/23440#issuecomment-1271567347. However, that obviously limits you to using a single worker only. I recommend you build a custom Spark image that also runs the SDK harness inside the same image. Otherwise you would need a custom operator to run worker pods that contain both a Spark worker and the SDK harness so that they can communicate via localhost. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
