mosche commented on issue #23932:
URL: https://github.com/apache/beam/issues/23932#issuecomment-1308560292

   > The following keeps executing and does not stop ...
   
   @nitinlkoin1984 That sounds like kubernetes restarting the job over and over 
again. Could you check that pls.
   
   > 2022/11/01 22:13:16 Failed to obtain provisioning information: failed to 
dial server at localhost:43225
   
   The error you are seeing in the SDK container logs give hint to the core of 
the problem. The SDK harness expects to talk to the Spark worker on 
`localhost`. This is **by design**, if communication would involve expensive 
network traffic performance would be unacceptable.
   
   This is also an issue when testing the SDK harness with Spark on Docker on a 
Mac (or Windows). Docker doesn't support host networking on these platforms, 
hence communication between containers using `localhost` isn't possible.
   There's two environment options to allow that for testing purposes, see 
https://github.com/apache/beam/issues/23440#issuecomment-1271567347. However, 
that obviously limits you to using a single worker only.
   
   I recommend you build a custom Spark image that also runs the SDK harness 
inside the same image. Otherwise you would need a custom operator to run worker 
pods that contain both a Spark worker and the SDK harness so that they can 
communicate via localhost.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to