Can you log onto 10.65.143.174 , find task 31 and take a stack trace ? Thanks
On Tue, Dec 29, 2015 at 9:19 AM, Darren Govoni <dar...@ontrenet.com> wrote: > Hi, > I've had this nagging problem where a task will hang and the entire job > hangs. Using pyspark. Spark 1.5.1 > > The job output looks like this, and hangs after the last task: > > ...... > 15/12/29 17:00:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in > memory on 10.65.143.174:34385 (size: 5.8 KB, free: 2.1 GB) > 15/12/29 17:00:39 INFO TaskSetManager: Finished task 15.0 in stage 0.0 > (TID 15) in 11668 ms on 10.65.143.174 (29/32) > 15/12/29 17:00:39 INFO TaskSetManager: Finished task 23.0 in stage 0.0 > (TID 23) in 11684 ms on 10.65.143.174 (30/32) > 15/12/29 17:00:39 INFO TaskSetManager: Finished task 7.0 in stage 0.0 > (TID 7) in 11717 ms on 10.65.143.174 (31/32) > {nothing here for a while, ~6mins} > > > Here is the executor status, from UI. > > 31 31 0 RUNNING PROCESS_LOCAL 2 / 10.65.143.174 2015/12/29 17:00:28 6.8 > min 0 ms 0 ms 60 ms 0 ms 0 ms 0.0 B > Here is executor 2 from 10.65.143.174. Never see task 31 get to the > executor.....any ideas? > > ..... > 15/12/29 17:00:38 INFO TorrentBroadcast: Started reading broadcast > variable 0 > 15/12/29 17:00:38 INFO MemoryStore: ensureFreeSpace(5979) called with > curMem=0, maxMem=2223023063 > 15/12/29 17:00:38 INFO MemoryStore: Block broadcast_0_piece0 stored as > bytes in memory (estimated size 5.8 KB, free 2.1 GB) > 15/12/29 17:00:38 INFO TorrentBroadcast: Reading broadcast variable 0 > took 208 ms > 15/12/29 17:00:38 INFO MemoryStore: ensureFreeSpace(8544) called with > curMem=5979, maxMem=2223023063 > 15/12/29 17:00:38 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 8.3 KB, free 2.1 GB) > 15/12/29 17:00:39 INFO PythonRunner: Times: total = 913, boot = 747, init > = 166, finish = 0 > 15/12/29 17:00:39 INFO Executor: Finished task 15.0 in stage 0.0 (TID > 15). 967 bytes result sent to driver > 15/12/29 17:00:39 INFO PythonRunner: Times: total = 955, boot = 735, init > = 220, finish = 0 > 15/12/29 17:00:39 INFO Executor: Finished task 23.0 in stage 0.0 (TID > 23). 967 bytes result sent to driver > 15/12/29 17:00:39 INFO PythonRunner: Times: total = 970, boot = 812, init > = 158, finish = 0 > 15/12/29 17:00:39 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). > 967 bytes result sent to driver > root@ip-10-65-143-174 2]$ > > > > Sent from my Verizon Wireless 4G LTE smartphone >