Hello, Probably this discussion is not of wide interest to this public list, I suggest to move it to analytics-internal?
Thanks, Nuria On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto <[email protected]> wrote: > Hm, interesting! I don't think many of us have used > SparkSession.builder.getOrCreate > repeatedly in the same process. What happens if you manually stop the > spark session first, (session.stop() > <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.stop>?) > or maybe try to explicitly create a new session via newSession() > <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession> > ? > > On Thu, Feb 6, 2020 at 7:31 PM Neil Shah-Quinn <[email protected]> > wrote: > >> Hi Luca! >> >> Those were separate Yarn jobs I started later. When I got this error, I >> found that the Yarn job corresponding to the SparkContext was marked as >> "successful", but I still couldn't get SparkSession.builder.getOrCreate to >> open a new one. >> >> Any idea what might have caused that or how I could recover without >> restarting the notebook, which could mean losing a lot of in-progress work? >> I had already restarted that kernel so I don't know if I'll encounter this >> problem again. If I do, I'll file a task. >> >> On Wed, 5 Feb 2020 at 23:24, Luca Toscano <[email protected]> wrote: >> >>> Hey Neil, >>> >>> there were two Yarn jobs running related to your notebooks, I just >>> killed them, let's see if it solves the problem (you might need to restart >>> again your notebook). If not, let's open a task and investigate :) >>> >>> Luca >>> >>> Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn < >>> [email protected]> ha scritto: >>> >>>> Whoa—I just got the same stopped SparkContext error on the query even >>>> after restarting the notebook, without an intermediate Java heap space >>>> error. That seems very strange to me. >>>> >>>> On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn <[email protected]> >>>> wrote: >>>> >>>>> Hey there! >>>>> >>>>> I was running SQL queries via PySpark (using the wmfdata package >>>>> <https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/hive.py>) >>>>> on SWAP when one of my queries failed with "java.lang.OutofMemoryError: >>>>> Java heap space". >>>>> >>>>> After that, when I tried to call the spark.sql function again (via >>>>> wmfdata.hive.run), it failed with "java.lang.IllegalStateException: Cannot >>>>> call methods on a stopped SparkContext." >>>>> >>>>> When I tried to create a new Spark context using >>>>> SparkSession.builder.getOrCreate (whether using wmfdata.spark.get_session >>>>> or directly), it returned a SparkContent object properly, but calling the >>>>> object's sql function still gave the "stopped SparkContext error". >>>>> >>>>> Any idea what's going on? I assume restarting the notebook kernel >>>>> would take care of the problem, but it seems like there has to be a better >>>>> way to recover. >>>>> >>>>> Thank you! >>>>> >>>>> >>>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
