Hello,

Probably this discussion is not of wide interest to this public list, I
suggest to move it to analytics-internal?

Thanks,

Nuria

On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto <[email protected]> wrote:

> Hm, interesting!  I don't think many of us have used 
> SparkSession.builder.getOrCreate
> repeatedly in the same process.  What happens if you manually stop the
> spark session first, (session.stop()
> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.stop>?)
> or maybe try to explicitly create a new session via newSession()
> <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.newSession>
> ?
>
> On Thu, Feb 6, 2020 at 7:31 PM Neil Shah-Quinn <[email protected]>
> wrote:
>
>> Hi Luca!
>>
>> Those were separate Yarn jobs I started later. When I got this error, I
>> found that the Yarn job corresponding to the SparkContext was marked as
>> "successful", but I still couldn't get SparkSession.builder.getOrCreate to
>> open a new one.
>>
>> Any idea what might have caused that or how I could recover without
>> restarting the notebook, which could mean losing a lot of in-progress work?
>> I had already restarted that kernel so I don't know if I'll encounter this
>> problem again. If I do, I'll file a task.
>>
>> On Wed, 5 Feb 2020 at 23:24, Luca Toscano <[email protected]> wrote:
>>
>>> Hey Neil,
>>>
>>> there were two Yarn jobs running related to your notebooks, I just
>>> killed them, let's see if it solves the problem (you might need to restart
>>> again your notebook). If not, let's open a task and investigate :)
>>>
>>> Luca
>>>
>>> Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn <
>>> [email protected]> ha scritto:
>>>
>>>> Whoa—I just got the same stopped SparkContext error on the query even
>>>> after restarting the notebook, without an intermediate Java heap space
>>>> error. That seems very strange to me.
>>>>
>>>> On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey there!
>>>>>
>>>>> I was running SQL queries via PySpark (using the wmfdata package
>>>>> <https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/hive.py>)
>>>>> on SWAP when one of my queries failed with "java.lang.OutofMemoryError:
>>>>> Java heap space".
>>>>>
>>>>> After that, when I tried to call the spark.sql function again (via
>>>>> wmfdata.hive.run), it failed with "java.lang.IllegalStateException: Cannot
>>>>> call methods on a stopped SparkContext."
>>>>>
>>>>> When I tried to create a new Spark context using
>>>>> SparkSession.builder.getOrCreate (whether using wmfdata.spark.get_session
>>>>> or directly), it returned a SparkContent object properly, but calling the
>>>>> object's sql function still gave the "stopped SparkContext error".
>>>>>
>>>>> Any idea what's going on? I assume restarting the notebook kernel
>>>>> would take care of the problem, but it seems like there has to be a better
>>>>> way to recover.
>>>>>
>>>>> Thank you!
>>>>>
>>>>>
>>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to