hzyangkai commented on issue #12968:
URL: 
https://github.com/apache/dolphinscheduler/issues/12968#issuecomment-1326996778

   Hi @Radeity, thanks for your review.
   
   How to separate submitApplication and monitorApplication is realized by the 
submit script of the computing engine. For the computing engine, such as flink 
or yarn, after submitting tasks to yarn, appid will be printed in the log 
immediately, and it will also provide parameters to control exiting the 
submitting process or blocking until the task is finished. For example, we can 
use "spark-submit --master yarn --deploy-mode cluster --conf 
spark.yarn.submit.waitAppCompletion=false" in spark.  After a task is submitted 
to yarn, the submitting process immediately prints the appid to the log and 
exits automatically.  The submitting process does not  wait for the task to 
end. Then ds can start a thread(or reuse the task-execute-thread) of 
monitorApplication that polls the status of the app in yarn based on the appid.
   
   For tasks submitted in spark client mode, such as "spark-submit --master 
yarn --deploy-mode client" or "spark-sql --master yarn" script, we can not 
separate the submission process and the monitoring process, because the end of 
the client process means the end of the entire application, and a wrapper is 
required to submit the sql task in cluster mode.
   
   For tasks submitted using beeline, taking spark as an example, usually , the 
task is submitted to a thrift server and then the thrift server runs the job in 
a shared yarn application. The appid is usually shared by many jobs. We need to 
focus on how to submit job in detached mode and get the jobid , this should 
probably be reported to client like beeline by thrifter server, and client like 
beeline can query job status by jobid. As far as I know, beeline should be able 
to submit sql in detached mode, but may not print jobid, which should be given 
more thought in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to