> and the verdict (supported by you) was that we should use this list or the public IRC channel. Indeed, eh? I suggest we revisit that to send questions to analytics-internal but if others disagree, I am fine with either.
On Fri, Feb 7, 2020 at 12:17 PM Neil Shah-Quinn <[email protected]> wrote: > Good suggestions, Andrew! I'll try those if I encounter this again. > > Nuria, we had a discussion about the appropriate places to ask questions > about internal systems in October 2018, and the verdict (supported by you) > was that we should use this list or the public IRC channel. > > If you want to revisit that decision, I'd suggest you consult that thread > first (the subject was "Where to ask questions about internal analytics > tools") because I included a detailed list of pros and cons of different > channels to start the discussion. In that list, I even mentioned that such > discussions on this channel could annoy subscribers who don't have access > to these systems 🙂 > > If you still want us to use a different list, we can certainly do that. If > so, please send my team a message and update the docs I added > <https://wikitech.wikimedia.org/wiki/Analytics#Contact> so it stays clear. > > On Fri, 7 Feb 2020 at 07:48, Nuria Ruiz <[email protected]> wrote: > >> Hello, >> >> Probably this discussion is not of wide interest to this public list, I >> suggest to move it to analytics-internal? >> >> Thanks, >> >> Nuria >> >> On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto <[email protected]> wrote: >> >>> Hm, interesting! I don't think many of us have used >>> SparkSession.builder.getOrCreate repeatedly in the same process. What >>> happens if you manually stop the spark session first, (session.stop() >>> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.stop>?) >>> or maybe try to explicitly create a new session via newSession() >>> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession> >>> ? >>> >>> On Thu, Feb 6, 2020 at 7:31 PM Neil Shah-Quinn <[email protected]> >>> wrote: >>> >>>> Hi Luca! >>>> >>>> Those were separate Yarn jobs I started later. When I got this error, I >>>> found that the Yarn job corresponding to the SparkContext was marked as >>>> "successful", but I still couldn't get SparkSession.builder.getOrCreate to >>>> open a new one. >>>> >>>> Any idea what might have caused that or how I could recover without >>>> restarting the notebook, which could mean losing a lot of in-progress work? >>>> I had already restarted that kernel so I don't know if I'll encounter this >>>> problem again. If I do, I'll file a task. >>>> >>>> On Wed, 5 Feb 2020 at 23:24, Luca Toscano <[email protected]> >>>> wrote: >>>> >>>>> Hey Neil, >>>>> >>>>> there were two Yarn jobs running related to your notebooks, I just >>>>> killed them, let's see if it solves the problem (you might need to restart >>>>> again your notebook). If not, let's open a task and investigate :) >>>>> >>>>> Luca >>>>> >>>>> Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn < >>>>> [email protected]> ha scritto: >>>>> >>>>>> Whoa—I just got the same stopped SparkContext error on the query even >>>>>> after restarting the notebook, without an intermediate Java heap space >>>>>> error. That seems very strange to me. >>>>>> >>>>>> On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey there! >>>>>>> >>>>>>> I was running SQL queries via PySpark (using the wmfdata package >>>>>>> <https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/hive.py>) >>>>>>> on SWAP when one of my queries failed with "java.lang.OutofMemoryError: >>>>>>> Java heap space". >>>>>>> >>>>>>> After that, when I tried to call the spark.sql function again (via >>>>>>> wmfdata.hive.run), it failed with "java.lang.IllegalStateException: >>>>>>> Cannot >>>>>>> call methods on a stopped SparkContext." >>>>>>> >>>>>>> When I tried to create a new Spark context using >>>>>>> SparkSession.builder.getOrCreate (whether using >>>>>>> wmfdata.spark.get_session >>>>>>> or directly), it returned a SparkContent object properly, but calling >>>>>>> the >>>>>>> object's sql function still gave the "stopped SparkContext error". >>>>>>> >>>>>>> Any idea what's going on? I assume restarting the notebook kernel >>>>>>> would take care of the problem, but it seems like there has to be a >>>>>>> better >>>>>>> way to recover. >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
