[jira] [Updated] (SPARK-27264) spark sql released all executor but the job is not done

Mike Chan (JIRA) Sun, 24 Mar 2019 08:38:10 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mike Chan updated SPARK-27264:
------------------------------
    Environment: Azure HDinsight spark 2.4 on Azure storage SQL: Read and Join 
some data and finally write result to a Hive metastore; query executed on 
jupyterhub; while the pre-migration cluster is a jupyter (non-hub)  (was: Azure 
HDinsight spark 2.4 on Azure storage SQL: Read and Join some data and finally 
write result to a Hive metastore)

> spark sql released all executor but the job is not done
> -------------------------------------------------------
>
>                 Key: SPARK-27264
>                 URL: https://issues.apache.org/jira/browse/SPARK-27264
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: Azure HDinsight spark 2.4 on Azure storage SQL: Read and 
> Join some data and finally write result to a Hive metastore; query executed 
> on jupyterhub; while the pre-migration cluster is a jupyter (non-hub)
>            Reporter: Mike Chan
>            Priority: Major
>
> I have a spark sql that used to execute < 10 mins now running at 3 hours 
> after a cluster migration and need to deep dive on what it's actually doing. 
> I'm new to spark and please don't mind if I'm asking something unrelated.
> Increased spark.executor.memory but no luck. Env: Azure HDinsight spark 2.4 
> on Azure storage SQL: Read and Join some data and finally write result to a 
> Hive metastore
> The sparl.sql ends with below code: 
> .write.mode("overwrite").saveAsTable("default.mikemiketable")
> Application Behavior: Within the first 15 mins, it loads and complete most 
> tasks (199/200); left only 1 executor process alive and continually to 
> shuffle read / write data. Because now it only leave 1 executor, we need to 
> wait 3 hours until this application finish. 
> [!https://i.stack.imgur.com/6hqvh.png!|https://i.stack.imgur.com/6hqvh.png]
> Left only 1 executor alive 
> [!https://i.stack.imgur.com/55162.png!|https://i.stack.imgur.com/55162.png]
> Not sure what's the executor doing: 
> [!https://i.stack.imgur.com/TwhuX.png!|https://i.stack.imgur.com/TwhuX.png]
> From time to time, we can tell the shuffle read increased: 
> [!https://i.stack.imgur.com/WhF9A.png!|https://i.stack.imgur.com/WhF9A.png]
> Therefore I increased the spark.executor.memory to 20g, but nothing changed. 
> From Ambari and YARN I can tell the cluster has many resources left. 
> [!https://i.stack.imgur.com/pngQA.png!|https://i.stack.imgur.com/pngQA.png]
> Release of almost all executor 
> [!https://i.stack.imgur.com/pA134.png!|https://i.stack.imgur.com/pA134.png]
> Any guidance is greatly appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27264) spark sql released all executor but the job is not done

Reply via email to