On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar <aniket.bhatna...@gmail.com> wrote: Thanks for sharing the thread dump. I had a look at them and couldn't find anything unusual. Is there anything in the logs (driver + executor) that suggests what's going on? Also, what does the spark job do and what is the version of spark and hadoop you are using?
I haven't seen anything in the logs; when I observed it happening before, in local mode, the last output before the hang would be a log statement from my code (that is, I had a log4j logger and was calling info() on that logger). That was also the last line of my main() function. Then, I saw no more output, neither from the driver nor the executors. I have seen the pause be as short as a few minutes, or approaching an hour. As far as I can tell, when it continues, the log statements look more or less normal. Locally, I'm using Spark 2.0.1 built for Hadoop 2.7, but without installing Hadoop. Remotely, I'm running on Google Cloud Dataproc, which also uses Spark 2.0.1, along with Hadoop 2.7.3. I've had it happen both locally and remotely. The job loads data from a text file (using SparkContext.textFile()), and then splits each line and converts it into an array of integers. From there, I do some sketching (the data encodes either a tree, a graph, or text, and I create a fixed-length sketch that probabilistically produces similar results for similar nodes in the tree/graph). I then do some lightweight clustering on the sketches, and save the cluster assignments to a text file. For what it's worth, when I look at the GC stats from the UI, they seem a bit high (they can be as high as 1 minute GC for a 15 minute run). However, those stats do not change during the pause period. On Wed, Nov 16, 2016 at 2:48 AM Aniket Bhatnagar <aniket.bhatna...@gmail.com> wrote: Also, how are you launching the application? Through spark submit or creating spark content in your app? I'm calling spark-submit, and then within my app I call SparkContext.getOrCreate() to get a context. I then call sc.textFile() to load my data into an RDD, and then perform various actions on that. I tried adding a call to sc.stop() at the very end, after seeing some discussion that that might be necessary, but it didn't seem to make a difference. The strange thing is that this behavior comes and goes. I tried opening the UI, as Pietro suggested, but that didn't seem to trigger it for me; I haven't figured out what, if anything, will make it happen every time. On Wednesday, November 16, 2016 4:41 AM, Pietro Pugni <pietro.pu...@gmail.com> wrote: I have the same issue with Spark 2.0.1, Java 1.8.x and pyspark. I also use SparkSQL and JDBC. My application runs locally. It happens only of I connect to the UI during Spark execution and even if I close the browser before the execution ends. I observed this behaviour both on macOS Sierra and Red Hat 6.7 That is interesting that you are seeing this too. I can't get it to happen by using the UI...but I also am having difficulty making it happen at all right now. (Only trying locally at the moment.)