If someone can share about the idea of sharing single SparkContext
through multiple SparkILoop safely, it'll be really helpful.
Here is a proposal:
1. In Spark code, change SparkIMain.scala to allow setting the virtual
directory. While creating new instances of SparkIMain per notebook from
zeppelin spark interpreter set all the instances of SparkIMain to the
same virtual directory.
2. Start HTTP server on that virtual directory and set this HTTP server
in Spark Context using classserverUri method
3. Scala generated code has a notion of packages. The default package
name is "line$<linenumber>". Package name can be controlled using System
Property scala.repl.name.line. Setting this property to "notebook id"
ensures that code generated by individual instances of SparkIMain is
isolated from other instances of SparkIMain
4. Build a queue inside interpreter to allow only one paragraph
execution at a time per notebook.
I have tested 1, 2, and 3 and it seems to provide isolation across
classnames. I'll work towards submitting a formal patch soon - Is there
any Jira already for the same that I can uptake? Also I need to understand:
1. How does Zeppelin uptake Spark fixes? OR do I need to first work
towards getting Spark changes merged in Apache Spark github?
Any suggestions on comments on the proposal are highly welcome.
Regards,
-Pranav.
On 10/08/15 11:36 pm, moon soo Lee wrote:
Hi piyush,
Separate instance of SparkILoop SparkIMain for each notebook while
sharing the SparkContext sounds great.
Actually, i tried to do it, found problem that multiple SparkILoop
could generates the same class name, and spark executor confuses
classname since they're reading classes from single SparkContext.
If someone can share about the idea of sharing single SparkContext
through multiple SparkILoop safely, it'll be really helpful.
Thanks,
moon
On Mon, Aug 10, 2015 at 1:21 AM Piyush Mukati (Data Platform)
<[email protected] <mailto:[email protected]>> wrote:
Hi Moon,
Any suggestion on it, have to wait lot when multiple people working with
spark.
Can we create separate instance of SparkILoop SparkIMain and printstrems
for each notebook while sharing theSparkContext ZeppelinContext SQLContext
and DependencyResolver and then use parallel scheduler ?
thanks
-piyush
Hi Moon,
How about tracking dedicated SparkContext for a notebook in Spark's
remote interpreter - this will allow multiple users to run their spark
paragraphs in parallel. Also, within a notebook only one paragraph is
executed at a time.
Regards,
-Pranav.
On 15/07/15 7:15 pm, moon soo Lee wrote:
> Hi,
>
> Thanks for asking question.
>
> The reason is simply because of it is running code statements. The
> statements can have order and dependency. Imagine i have two paragraphs
>
> %spark
> val a = 1
>
> %spark
> print(a)
>
> If they're not running one by one, that means they possibly runs in
> random order and the output will be always different. Either '1' or
> 'val a can not found'.
>
> This is the reason why. But if there are nice idea to handle this
> problem i agree using parallel scheduler would help a lot.
>
> Thanks,
> moon
> On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng
> <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
>
> any one who have the same question with me? or this is not a question?
>
> 2015-07-14 11:47 GMT+08:00 linxi zeng <[email protected]
<mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>>:
>
> hi, Moon:
> I notice that the getScheduler function in the
> SparkInterpreter.java return a FIFOScheduler which makes the
> spark interpreter run spark job one by one. It's not a good
> experience when couple of users do some work on zeppelin at
> the same time, because they have to wait for each other.
> And at the same time, SparkSqlInterpreter can chose what
> scheduler to use by "zeppelin.spark.concurrentSQL".
> My question is, what kind of consideration do you based on to
> make such a decision?
>
>
------------------------------------------------------------------------------------------------------------------------------------------
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom
they are addressed. If you have received this email in error
please notify the system manager. This message contains
confidential information and is intended only for the individual
named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by
mistake and delete this e-mail from your system. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of
this information is strictly prohibited. Although Flipkart has
taken reasonable precautions to ensure no viruses are present in
this email, the company cannot accept responsibility for any loss
or damage arising from the use of this email or attachments