[
https://issues.apache.org/jira/browse/AIRFLOW-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922339#comment-16922339
]
Sergio Soto commented on AIRFLOW-5385:
--------------------------------------
Hello [~ash] ,
we are trying to help with an issue that we have resolved with [~Diego García]
[pastebin|[https://pastebin.com/uhukUkjE]] hack.
The API endpoint used in [pastebin|[https://pastebin.com/uhukUkjE]] has been
obtained from Spark documentation. This is a timed spark-submit operator status
command:
{code:java}
time /opt/java/jdk1.8.0_181/jre/bin/java -cp
/opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g
org.apache.spark.deploy.SparkSubmit --master
spark://spk01v.corp.logitravelgroup.com:6066 --status
driver-20190904092447-2910
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/09/04 11:53:38 INFO RestSubmissionClient: Submitting a request for the
status of submission driver-20190904092447-2910 in
spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066.
19/09/04 11:53:42 INFO RestSubmissionClient: Server responded with
SubmissionStatusResponse:
{
"action" : "SubmissionStatusResponse",
"driverState" : "RUNNING",
"serverSparkVersion" : "2.2.1",
"submissionId" : "driver-20190904092447-2910",
"success" : true,
"workerHostPort" : "172.25.10.210:41825",
"workerId" : "worker-20190718125659-172.25.10.210-41825"
}real 0m6.547s
user 0m1.923s
sys 0m0.152s
{code}
This is a timed curl call:
{code:java}
time curl
http://spk01v.corp:6066/v1/submissions/status/driver-20190904092447-2910
{
"action" : "SubmissionStatusResponse",
"driverState" : "RUNNING",
"serverSparkVersion" : "2.2.1",
"submissionId" : "driver-20190904092447-2910",
"success" : true,
"workerHostPort" : "172.25.10.210:41825",
"workerId" : "worker-20190718125659-172.25.10.210-41825"
}
0,02s user
0,01s system
49% cpu
0,056 total{code}
I'm according with you on time spend by JVM. And to avoid it, [~Diego García]
proposed a simple way to check driver status with a simple curl.
Do you think a PR with this change could be useful?
> SparkSubmit status spend lot of time
> ------------------------------------
>
> Key: AIRFLOW-5385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5385
> Project: Apache Airflow
> Issue Type: Improvement
> Components: contrib
> Affects Versions: 1.10.2
> Reporter: Sergio Soto
> Priority: Blocker
>
> Hello,
> we have an issue with SparkSubmitOperator. Airflow DAGs shows that some
> streaming applications breaks out. I analyzed this behaviour. The
> SparkSubmitHook is the responsable of check the driver status.
> We discovered some timeouts and tried to reproduce checking command. This is
> an execution with `time`:
> {code:java}
> time /opt/java/jdk1.8.0_181/jre/bin/java -cp
> /opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g
> org.apache.spark.deploy.SparkSubmit --master
> spark://spark-master.corp.com:6066 --status driver-20190901180337-2749
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the
> status of submission driver-20190901180337-2749 in
> spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066.
> 19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with
> SubmissionStatusResponse:
> {
> "action" : "SubmissionStatusResponse",
> "driverState" : "RUNNING",
> "serverSparkVersion" : "2.2.1",
> "submissionId" : "driver-20190901180337-2749",
> "success" : true,
> "workerHostPort" : "172.25.10.194:45441",
> "workerId" : "worker-20190821201014-172.25.10.194-45441"
> }
> real 0m11.598s
> user 0m2.092s
> sys 0m0.222s{code}
> We analyzed the Scala code and Spark API. This spark-submit status command
> ends with a http get request to an url. Using curl, this is the time spent by
> spark master to return status:
> {code:java}
> time curl
> "http://spark-master.corp.com:6066/v1/submissions/status/driver-20190901180337-2749"
> {
> "action" : "SubmissionStatusResponse",
> "driverState" : "RUNNING",
> "serverSparkVersion" : "2.2.1",
> "submissionId" : "driver-20190901180337-2749",
> "success" : true,
> "workerHostPort" : "172.25.10.194:45441",
> "workerId" : "worker-20190821201014-172.25.10.194-45441"
> }
> real 0m0.011s
> user 0m0.000s
> sys 0m0.006s
> {code}
> Task spends 11.59 seconds with spark submit versus 0.011seconds with curl
> How can be this behaviour explained?
--
This message was sent by Atlassian Jira
(v8.3.2#803003)