Sergio Soto created AIRFLOW-5385:
------------------------------------
Summary: SparkSubmit status spend lot of time
Key: AIRFLOW-5385
URL: https://issues.apache.org/jira/browse/AIRFLOW-5385
Project: Apache Airflow
Issue Type: Improvement
Components: contrib
Affects Versions: 1.10.2
Reporter: Sergio Soto
Hello,
we have an issue with SparkSubmitOperator. Airflow DAGs shows that some
streaming applications breaks out. I analyzed this behaviour. The
SparkSubmitHook is the responsable of check the driver status.
We discovered some timeouts and tried to reproduce checking command. This is an
execution with `time`:
{code:java}
time /opt/java/jdk1.8.0_181/jre/bin/java -cp
/opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g
org.apache.spark.deploy.SparkSubmit --master spark://spark-master.corp.com:6066
--status driver-20190901180337-2749
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the
status of submission driver-20190901180337-2749 in
spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066.
19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with
SubmissionStatusResponse:
{
"action" : "SubmissionStatusResponse",
"driverState" : "RUNNING",
"serverSparkVersion" : "2.2.1",
"submissionId" : "driver-20190901180337-2749",
"success" : true,
"workerHostPort" : "172.25.10.194:45441",
"workerId" : "worker-20190821201014-172.25.10.194-45441"
}
real 0m11.598s
user 0m2.092s
sys 0m0.222s{code}
We analyzed the Scala code and Spark API. This spark-submit status command ends
with a http get request to an url. Using curl, this is the time spent by spark
master to return status:
{code:java}
time curl
"http://lgmadbdtpspk01v.corp.logitravelgroup.com:6066/v1/submissions/status/driver-20190901180337-2749"
{
"action" : "SubmissionStatusResponse",
"driverState" : "RUNNING",
"serverSparkVersion" : "2.2.1",
"submissionId" : "driver-20190901180337-2749",
"success" : true,
"workerHostPort" : "172.25.10.194:45441",
"workerId" : "worker-20190821201014-172.25.10.194-45441"
}
real 0m0.011s
user 0m0.000s
sys 0m0.006s
{code}
Task spends 11.59 seconds with spark submit versus 0.011seconds with curl
How can be this behaviour explained?
--
This message was sent by Atlassian Jira
(v8.3.2#803003)