Simeon Simeonov created SPARK-9625:
--------------------------------------

             Summary: SparkILoop creates sql context continuously, thousands of 
times
                 Key: SPARK-9625
                 URL: https://issues.apache.org/jira/browse/SPARK-9625
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell
    Affects Versions: 1.4.1
         Environment: Ubuntu on AWS
            Reporter: Simeon Simeonov


Occasionally but repeatably, based on the Spark SQL operations being run, 
{{spark-shell}} gets into a funk where it attempts to create a sql context over 
and over again as it is doing its work. Example output below:

{code}
15/08/05 03:04:12 INFO DAGScheduler: looking for newly runnable stages
15/08/05 03:04:12 INFO DAGScheduler: running: Set()
15/08/05 03:04:12 INFO DAGScheduler: waiting: Set(ShuffleMapStage 7, 
ResultStage 8)
15/08/05 03:04:12 INFO DAGScheduler: failed: Set()
15/08/05 03:04:12 INFO DAGScheduler: Missing parents for ShuffleMapStage 7: 
List()
15/08/05 03:04:12 INFO DAGScheduler: Missing parents for ResultStage 8: 
List(ShuffleMapStage 7)
15/08/05 03:04:12 INFO DAGScheduler: Submitting ShuffleMapStage 7 
(MapPartitionsRDD[49] at map at <console>:474), which is now runnable
15/08/05 03:04:12 INFO MemoryStore: ensureFreeSpace(47840) called with 
curMem=685306, maxMem=26671746908
15/08/05 03:04:12 INFO MemoryStore: Block broadcast_12 stored as values in 
memory (estimated size 46.7 KB, free 24.8 GB)
15/08/05 03:04:12 INFO MemoryStore: ensureFreeSpace(15053) called with 
curMem=733146, maxMem=26671746908
15/08/05 03:04:12 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes 
in memory (estimated size 14.7 KB, free 24.8 GB)
15/08/05 03:04:12 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on 
localhost:39451 (size: 14.7 KB, free: 24.8 GB)
15/08/05 03:04:12 INFO SparkContext: Created broadcast 12 from broadcast at 
DAGScheduler.scala:874
15/08/05 03:04:12 INFO DAGScheduler: Submitting 1 missing tasks from 
ShuffleMapStage 7 (MapPartitionsRDD[49] at map at <console>:474)
15/08/05 03:04:12 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
15/08/05 03:04:12 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 684, 
localhost, PROCESS_LOCAL, 1461 bytes)
15/08/05 03:04:12 INFO Executor: Running task 0.0 in stage 7.0 (TID 684)
15/08/05 03:04:12 INFO ShuffleBlockFetcherIterator: Getting 214 non-empty 
blocks out of 214 blocks
15/08/05 03:04:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 
1 ms
15/08/05 03:04:12 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO HiveMetaStore: No user is added in admin role, since 
config is empty
15/08/05 03:04:13 INFO SessionState: No Tez session required at this point. 
hive.execution.engine=mr.
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
{code}

In the end, the job succeeded and produced correct output but it ran many times 
slower that it should have.

Longer output, showing job completion, can be found 
[here|https://gist.github.com/ssimeonov/8064f5f77897bb337157]. In the example, 
sql context was created about 2,500 times. I've had runs where it's been over 
100,000 times.

Could it be that an exception silently generated by the job is causing this?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to