blackberrier commented on issue #5648: URL: https://github.com/apache/dolphinscheduler/issues/5648#issuecomment-868423184
> when use spark on k8s(spark 2.4.8 ), I make a list what we need to do: > > 1. change dolphinscheudler from hdfs to minio; > 2. change kill yarn job by applicationId(both in worker and master) to stop spark-driver-pod. > 3. without yarn log aggregate, have to manual aggregate spark executor log by ELK or some other tool. > 4. change the spark job monitor way from yarn-rest-api to k8s-api; > 5... > > @blackberrier anything else please do some complement ? > > I'm doing a POC to migrate from yarn to k8s @geosmart You have considered this issue comprehensively, I think maybe we can add the following one? 1. build spark docker images, and when submitting applications let user choose or fill the docker name and version parameters. And other parameters as well? Ps, maybe we should consider more about your 3rd and 4th point. In yarn side, we get yarn application id from logs and filter pattern like "application_xxxxxxxxxx_yyyy", and get application state through yarn-rest-api. As to kubernetes, we don't have application pattern like yarn, maybe we should add some pattern to pods or something? Ps, as we discussed through email, maybe this issue is not that hurry? @CalvinKirs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
