[ https://issues.apache.org/jira/browse/SPARK-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389726#comment-15389726 ]
Joe Chong commented on SPARK-16676: ----------------------------------- It didn't. How do I troubleshoot. From the attached picture, the stage triggers the job, but it stayed in pending till I had to kill the job from Spark UI. > Spark jobs stay in pending > -------------------------- > > Key: SPARK-16676 > URL: https://issues.apache.org/jira/browse/SPARK-16676 > Project: Spark > Issue Type: Bug > Components: MLlib, Spark Shell > Affects Versions: 1.5.2 > Environment: Mac OS X Yosemite, Terminal, Spark-shell standalone > Reporter: Joe Chong > Attachments: Spark UI stays @ pending.png > > > I've been having issues executing certain Scala statements within the > Spark-Shell. These statements are obtained through tutorial/blog written by > Carol McDonald in MapR. > The import statements, reading text files into DataFrames are OK. However, > when I try to do df.show(), the execution hits a road block. Checking the > Spark UI job, I see that the Stage's active, however, 1 of its dependent job > stays in Pending without any movement. The logs are as below. > scala> fltCountsql.show() > 16/07/22 11:40:16 INFO spark.SparkContext: Starting job: show at <console>:46 > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Registering RDD 31 (show at > <console>:46) > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Got job 4 (show at > <console>:46) with 200 output partitions > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Final stage: ResultStage > 8(show at <console>:46) > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Parents of final stage: > List(ShuffleMapStage 7) > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Missing parents: > List(ShuffleMapStage 7) > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 7 > (MapPartitionsRDD[31] at show at <console>:46), which has no missing parents > 16/07/22 11:40:16 INFO storage.MemoryStore: ensureFreeSpace(18128) called > with curMem=115755879, maxMem=2778495713 > 16/07/22 11:40:16 INFO storage.MemoryStore: Block broadcast_5 stored as > values in memory (estimated size 17.7 KB, free 2.5 GB) > 16/07/22 11:40:16 INFO storage.MemoryStore: ensureFreeSpace(7527) called with > curMem=115774007, maxMem=2778495713 > 16/07/22 11:40:16 INFO storage.MemoryStore: Block broadcast_5_piece0 stored > as bytes in memory (estimated size 7.4 KB, free 2.5 GB) > 16/07/22 11:40:16 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in > memory on localhost:61408 (size: 7.4 KB, free: 2.5 GB) > 16/07/22 11:40:16 INFO spark.SparkContext: Created broadcast 5 from broadcast > at DAGScheduler.scala:861 > 16/07/22 11:40:16 INFO scheduler.DAGScheduler: Submitting 2 missing tasks > from ShuffleMapStage 7 (MapPartitionsRDD[31] at show at <console>:46) > 16/07/22 11:40:16 INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with > 2 tasks > 16/07/22 11:40:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 7.0 (TID 4, localhost, PROCESS_LOCAL, 2156 bytes) > 16/07/22 11:40:16 INFO executor.Executor: Running task 0.0 in stage 7.0 (TID > 4) > 16/07/22 11:40:16 INFO storage.BlockManager: Found block rdd_2_0 locally > 16/07/22 11:40:17 INFO executor.Executor: Finished task 0.0 in stage 7.0 (TID > 4). 2738 bytes result sent to driver > 16/07/22 11:40:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage > 7.0 (TID 4) in 920 ms on localhost (1/2) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org