Hi Neil, I added the Analytics tag to https://phabricator.wikimedia.org/T245097, and also thanks for filing https://phabricator.wikimedia.org/T245713. We periodically review tasks in our incoming queue, so we should be able to help soon, but it may depend on priorities.
Luca Il giorno gio 20 feb 2020 alle ore 06:21 Neil Shah-Quinn < [email protected]> ha scritto: > Another update: I'm continuing to encounter these Spark errors and have > trouble recovering from them, even when I use proper settings. I've filed > T245713 <https://phabricator.wikimedia.org/T245713> to discuss this > further. The specific errors and behavior I'm seeing (for example, whether > explicitly calling session.stop allows a new functioning session to be > created) are not consistent, so I'm still trying to make sense of it. > > I would greatly appreciate any input or help, even if it's identifying > places where my description doesn't make sense. > <https://phabricator.wikimedia.org/T245713> > <https://phabricator.wikimedia.org/T245713> > > On Wed, 19 Feb 2020 at 13:35, Neil Shah-Quinn <[email protected]> > wrote: > >> Bump! >> >> Analytics team, I'm eager to have input from y'all about the best Spark >> settings to use. >> >> On Fri, 14 Feb 2020 at 18:30, Neil Shah-Quinn <[email protected]> >> wrote: >> >>> I ran into this problem again, and I found that neither session.stop or >>> newSession got rid of the error. So it's still not clear how to recover >>> from a crashed(?) Spark session. >>> >>> On the other hand, I did figure out why my sessions were crashing in the >>> first place, so hopefully recovering from that will be a rare need. The >>> reason is that wmfdata doesn't modify >>> <https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/spark.py#L60> >>> the default Spark when it starts a new session, which was (for example) >>> causing it to start executors with only ~400 MiB of memory each. >>> >>> I'm definitely going to change that, but it's not completely clear what >>> the recommended settings for our cluster are. I cataloged the different >>> recommendations at https://phabricator.wikimedia.org/T245097, and it >>> would very helpful if one of y'all could give some clear recommendations >>> about what the settings should be for local SWAP, YARN, and "large" YARN >>> jobs. For example, is it important to increase spark.sql.shuffle.partitions >>> for YARN jobs? Is it reasonable to use 8 GiB of driver memory for a local >>> job when the SWAP servers only have 64 GiB total? >>> >>> Thank you! >>> >>> >>> >>> >>> On Fri, 7 Feb 2020 at 06:53, Andrew Otto <[email protected]> wrote: >>> >>>> Hm, interesting! I don't think many of us have used >>>> SparkSession.builder.getOrCreate repeatedly in the same process. What >>>> happens if you manually stop the spark session first, (session.stop() >>>> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.stop>?) >>>> or maybe try to explicitly create a new session via newSession() >>>> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession> >>>> ? >>>> >>>> On Thu, Feb 6, 2020 at 7:31 PM Neil Shah-Quinn < >>>> [email protected]> wrote: >>>> >>>>> Hi Luca! >>>>> >>>>> Those were separate Yarn jobs I started later. When I got this error, >>>>> I found that the Yarn job corresponding to the SparkContext was marked as >>>>> "successful", but I still couldn't get SparkSession.builder.getOrCreate to >>>>> open a new one. >>>>> >>>>> Any idea what might have caused that or how I could recover without >>>>> restarting the notebook, which could mean losing a lot of in-progress >>>>> work? >>>>> I had already restarted that kernel so I don't know if I'll encounter this >>>>> problem again. If I do, I'll file a task. >>>>> >>>>> On Wed, 5 Feb 2020 at 23:24, Luca Toscano <[email protected]> >>>>> wrote: >>>>> >>>>>> Hey Neil, >>>>>> >>>>>> there were two Yarn jobs running related to your notebooks, I just >>>>>> killed them, let's see if it solves the problem (you might need to >>>>>> restart >>>>>> again your notebook). If not, let's open a task and investigate :) >>>>>> >>>>>> Luca >>>>>> >>>>>> Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn < >>>>>> [email protected]> ha scritto: >>>>>> >>>>>>> Whoa—I just got the same stopped SparkContext error on the query >>>>>>> even after restarting the notebook, without an intermediate Java heap >>>>>>> space >>>>>>> error. That seems very strange to me. >>>>>>> >>>>>>> On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hey there! >>>>>>>> >>>>>>>> I was running SQL queries via PySpark (using the wmfdata package >>>>>>>> <https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/hive.py>) >>>>>>>> on SWAP when one of my queries failed with "java.lang.OutofMemoryError: >>>>>>>> Java heap space". >>>>>>>> >>>>>>>> After that, when I tried to call the spark.sql function again (via >>>>>>>> wmfdata.hive.run), it failed with "java.lang.IllegalStateException: >>>>>>>> Cannot >>>>>>>> call methods on a stopped SparkContext." >>>>>>>> >>>>>>>> When I tried to create a new Spark context using >>>>>>>> SparkSession.builder.getOrCreate (whether using >>>>>>>> wmfdata.spark.get_session >>>>>>>> or directly), it returned a SparkContent object properly, but calling >>>>>>>> the >>>>>>>> object's sql function still gave the "stopped SparkContext error". >>>>>>>> >>>>>>>> Any idea what's going on? I assume restarting the notebook kernel >>>>>>>> would take care of the problem, but it seems like there has to be a >>>>>>>> better >>>>>>>> way to recover. >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> Analytics mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
