Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

Sean Owen Tue, 22 Dec 2015 11:01:37 -0800

I think the original idea is that the life of the driver is the life
of the SparkContext: the context is stopped when the driver finishes.
Or: if for some reason the "context" dies or there's an unrecoverable
error, that's it for the driver.


(There's nothing wrong with stop(), right? you have to call that when
the driver ends to shut down Spark cleanly. It's the re-starting
another context that's at issue.)

This makes most sense in the context of a resource manager, which can
conceivably restart a driver if you like, but can't reach into your
program.

That's probably still the best way to think of it. Still it would be
nice if SparkContext were friendlier to a restart just as a matter of
design. AFAIK it is; not sure about SQLContext though. If it's not a
priority it's just because this isn't a usual usage pattern, which
doesn't mean it's crazy, just not the primary pattern.

On Tue, Dec 22, 2015 at 5:57 PM, Jerry Lam <chiling...@gmail.com> wrote:
> Hi Sean,
>
> What if the spark context stops for involuntary reasons (misbehavior of some 
> connections) then we need to programmatically handle the failures by 
> recreating spark context. Is there something I don't understand/know about 
> the assumptions on how to use spark context? I tend to think of it as a 
> resource manager/scheduler for spark jobs. Are you guys planning to deprecate 
> the stop method from spark?
>
> Best Regards,
>
> Jerry
>
> Sent from my iPhone
>
>> On 22 Dec, 2015, at 3:57 am, Sean Owen <so...@cloudera.com> wrote:
>>
>> Although in many cases it does work to stop and then start a second
>> context, it wasn't how Spark was originally designed, and I still see
>> gotchas. I'd avoid it. I don't think you should have to release some
>> resources; just keep the same context alive.
>>
>>> On Tue, Dec 22, 2015 at 5:13 AM, Jerry Lam <chiling...@gmail.com> wrote:
>>> Hi Zhan,
>>>
>>> I'm illustrating the issue via a simple example. However it is not difficult
>>> to imagine use cases that need this behaviour. For example, you want to
>>> release all resources of spark when it does not use for longer than an hour
>>> in  a job server like web services. Unless you can prevent people from
>>> stopping spark context, then it is reasonable to assume that people can stop
>>> it and start it again in  later time.
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>>
>>>> On Mon, Dec 21, 2015 at 7:20 PM, Zhan Zhang <zzh...@hortonworks.com> wrote:
>>>>
>>>> This looks to me is a very unusual use case. You stop the SparkContext,
>>>> and start another one. I don’t think it is well supported. As the
>>>> SparkContext is stopped, all the resources are supposed to be released.
>>>>
>>>> Is there any mandatory reason you have stop and restart another
>>>> SparkContext.
>>>>
>>>> Thanks.
>>>>
>>>> Zhan Zhang
>>>>
>>>> Note that when sc is stopped, all resources are released (for example in
>>>> yarn
>>>>> On Dec 20, 2015, at 2:59 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>>>>
>>>>> Hi Spark developers,
>>>>>
>>>>> I found that SQLContext.getOrCreate(sc: SparkContext) does not behave
>>>>> correctly when a different spark context is provided.
>>>>>
>>>>> ```
>>>>> val sc = new SparkContext
>>>>> val sqlContext =SQLContext.getOrCreate(sc)
>>>>> sc.stop
>>>>> ...
>>>>>
>>>>> val sc2 = new SparkContext
>>>>> val sqlContext2 = SQLContext.getOrCreate(sc2)
>>>>> sc2.stop
>>>>> ```
>>>>>
>>>>> The sqlContext2 will reference sc instead of sc2 and therefore, the
>>>>> program will not work because sc has been stopped.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Jerry
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

Reply via email to