Agree with Joel, we may think to re-factor the Zeppelin architecture so that it can handle multi-tenancy easily. The technical solution proposed by Pranav is great but it only applies to Spark. Right now, each interpreter has to manage multi-tenancy its own way. Ultimately Zeppelin can propose a multi-tenancy contract/info (like UserContext, similar to InterpreterContext) so that each interpreter can choose to use or not.
On Sun, Aug 16, 2015 at 3:09 AM, Joel Zambrano <[email protected]> wrote: > I think while the idea of running multiple notes simultaneously is great. > It is really dancing around the lack of true multi user support in > Zeppelin. While the proposed solution would work if the applications > resources are those of the whole cluster, if the app is limited (say they > are 8 cores of 16, with some distribution in memory) then potentially your > note can hog all the resources and the scheduler will have to throttle all > other executions leaving you exactly where you are now. > While I think the solution is a good one, maybe this question makes us > think in adding true multiuser support. > Where we isolate resources (cluster and the notebooks themselves), have > separate login/identity and (I don't know if it's possible) share the same > context. > > Thanks, > Joel > > > On Aug 15, 2015, at 1:58 PM, Rohit Agarwal <[email protected]> wrote: > > > > If the problem is that multiple users have to wait for each other while > > using Zeppelin, the solution already exists: they can create a new > > interpreter by going to the interpreter page and attach it to their > > notebook - then they don't have to wait for others to submit their job. > > > > But I agree, having paragraphs from one note wait for paragraphs from > other > > notes is a confusing default. We can get around that in two ways: > > > > 1. Create a new interpreter for each note and attach that interpreter > to > > that note. This approach would require the least amount of code > changes but > > is resource heavy and doesn't let you share Spark Context between > different > > notes. > > 2. If we want to share the Spark Context between different notes, we > can > > submit jobs from different notes into different fairscheduler pools ( > > > https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application > ). > > This can be done by submitting jobs from different notes in different > > threads. This will make sure that jobs from one note are run > sequentially > > but jobs from different notes will be able to run in parallel. > > > > Neither of these options require any change in the Spark code. > > > > -- > > Thanks & Regards > > Rohit Agarwal > > https://www.linkedin.com/in/rohitagarwal003 > > > > On Sat, Aug 15, 2015 at 12:01 PM, Pranav Kumar Agarwal < > [email protected]> > > wrote: > > > >> If someone can share about the idea of sharing single SparkContext > through > >>> multiple SparkILoop safely, it'll be really helpful. > >> Here is a proposal: > >> 1. In Spark code, change SparkIMain.scala to allow setting the virtual > >> directory. While creating new instances of SparkIMain per notebook from > >> zeppelin spark interpreter set all the instances of SparkIMain to the > same > >> virtual directory. > >> 2. Start HTTP server on that virtual directory and set this HTTP server > in > >> Spark Context using classserverUri method > >> 3. Scala generated code has a notion of packages. The default package > name > >> is "line$<linenumber>". Package name can be controlled using System > >> Property scala.repl.name.line. Setting this property to "notebook id" > >> ensures that code generated by individual instances of SparkIMain is > >> isolated from other instances of SparkIMain > >> 4. Build a queue inside interpreter to allow only one paragraph > execution > >> at a time per notebook. > >> > >> I have tested 1, 2, and 3 and it seems to provide isolation across > >> classnames. I'll work towards submitting a formal patch soon - Is there > any > >> Jira already for the same that I can uptake? Also I need to understand: > >> 1. How does Zeppelin uptake Spark fixes? OR do I need to first work > >> towards getting Spark changes merged in Apache Spark github? > >> > >> Any suggestions on comments on the proposal are highly welcome. > >> > >> Regards, > >> -Pranav. > >> > >>> On 10/08/15 11:36 pm, moon soo Lee wrote: > >>> > >>> Hi piyush, > >>> > >>> Separate instance of SparkILoop SparkIMain for each notebook while > >>> sharing the SparkContext sounds great. > >>> > >>> Actually, i tried to do it, found problem that multiple SparkILoop > could > >>> generates the same class name, and spark executor confuses classname > since > >>> they're reading classes from single SparkContext. > >>> > >>> If someone can share about the idea of sharing single SparkContext > >>> through multiple SparkILoop safely, it'll be really helpful. > >>> > >>> Thanks, > >>> moon > >>> > >>> > >>> On Mon, Aug 10, 2015 at 1:21 AM Piyush Mukati (Data Platform) < > >>> [email protected] <mailto:[email protected]>> wrote: > >>> > >>> Hi Moon, > >>> Any suggestion on it, have to wait lot when multiple people working > >>> with spark. > >>> Can we create separate instance of SparkILoop SparkIMain and > >>> printstrems for each notebook while sharing theSparkContext > >>> ZeppelinContext SQLContext and DependencyResolver and then use > parallel > >>> scheduler ? > >>> thanks > >>> > >>> -piyush > >>> > >>> Hi Moon, > >>> > >>> How about tracking dedicated SparkContext for a notebook in Spark's > >>> remote interpreter - this will allow multiple users to run their > spark > >>> paragraphs in parallel. Also, within a notebook only one paragraph > is > >>> executed at a time. > >>> > >>> Regards, > >>> -Pranav. > >>> > >>> > >>>> On 15/07/15 7:15 pm, moon soo Lee wrote: > >>>> Hi, > >>>> > >>>> Thanks for asking question. > >>>> > >>>> The reason is simply because of it is running code statements. The > >>>> statements can have order and dependency. Imagine i have two > >>> paragraphs > >>>> > >>>> %spark > >>>> val a = 1 > >>>> > >>>> %spark > >>>> print(a) > >>>> > >>>> If they're not running one by one, that means they possibly runs in > >>>> random order and the output will be always different. Either '1' or > >>>> 'val a can not found'. > >>>> > >>>> This is the reason why. But if there are nice idea to handle this > >>>> problem i agree using parallel scheduler would help a lot. > >>>> > >>>> Thanks, > >>>> moon > >>>> On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng > >>>> <[email protected] <mailto:[email protected]> > >>> <mailto:[email protected] <mailto:[email protected]>>> > >>> wrote: > >>>> > >>>> any one who have the same question with me? or this is not a > >>> question? > >>>> > >>>> 2015-07-14 11:47 GMT+08:00 linxi zeng <[email protected] > >>> <mailto:[email protected]> > >>>> <mailto:[email protected] <mailto: > >>> [email protected]>>>: > >>>> > >>>> hi, Moon: > >>>> I notice that the getScheduler function in the > >>>> SparkInterpreter.java return a FIFOScheduler which makes the > >>>> spark interpreter run spark job one by one. It's not a good > >>>> experience when couple of users do some work on zeppelin at > >>>> the same time, because they have to wait for each other. > >>>> And at the same time, SparkSqlInterpreter can chose what > >>>> scheduler to use by "zeppelin.spark.concurrentSQL". > >>>> My question is, what kind of consideration do you based on > >>> to > >>>> make such a decision? > >>> > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------------------------------------------------------------------ > >>> > >>> This email and any files transmitted with it are confidential and > >>> intended solely for the use of the individual or entity to whom > >>> they are addressed. If you have received this email in error > >>> please notify the system manager. This message contains > >>> confidential information and is intended only for the individual > >>> named. If you are not the named addressee you should not > >>> disseminate, distribute or copy this e-mail. Please notify the > >>> sender immediately by e-mail if you have received this e-mail by > >>> mistake and delete this e-mail from your system. If you are not > >>> the intended recipient you are notified that disclosing, copying, > >>> distributing or taking any action in reliance on the contents of > >>> this information is strictly prohibited. Although Flipkart has > >>> taken reasonable precautions to ensure no viruses are present in > >>> this email, the company cannot accept responsibility for any loss > >>> or damage arising from the use of this email or attachments > >> >
