yaooqinn commented on pull request #841: URL: https://github.com/apache/incubator-kyuubi/pull/841#issuecomment-884239174
Thanks @holdenk > So I don't know anything about this project but certainly does seem like it could be interesting. Do worry about the project itself. The project relies on Spark who can achieve better scalability to use computing resources more effectively on Kubernetes. And this is the motivation of this PR. > This provides an alternative shuffle service for Spark to use on Kube? Is there a design doc? Yes, this provides an external shuffle service for Spark to use on Kube. The current implementation is codeless as you can see in this PR. We only need to write a Dockerfile(3 LOC only) for the shuffle service(`org.apache.spark.deploy.ExternalShuffleService`) endpoint based on the official Spark image and then deploy it on k8s as a DaemonSet. This can work with official releases(verified 3.1.2) without any modification. The only thing a bit tricky here is that users need to enable the `hostNetwork` to let the client successfully establish the connection `LOCAL` shuffle server during an executor initialization. > I notice hostNetwork has to be set to true which is something I'm generally speaking not comfortable with for security reasons, is that a temporary design decision or something more permanent? I agree that using hostNetwork is not secure and incomprehensive for this feature. I will try to figure out a solution with the Kubernetes pod network only, maybe in the next few weeks with a SPIP or a POC PR in the Spark community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
