Re: Extending SparkInterpreter functionality

Ankit Jain Fri, 02 Feb 2018 08:34:28 -0800

This is exactly what we want Jeff! A hook to plug in our own interpreters.
(I am on same team as Jhon btw)


Right now there are too many concrete references and injecting stuff is not 
possible. 

Eg of customizations - 
1) Spark UI which works differently on EMR than standalone, so that logic will 
be in an interpreter specific to emr.
2) We want to add more metrics & logs in the interpreter, say number of 
requests coming to the interpreter.
3) Ideally we will like to connect to different spark clusters in spark-submit 
and not tie to one which happens on Zeppelin startup right now.

Basically we want to add lot more flexibility.

We are building a platform to cater to multiple clients. So, multiple Zeppelin 
instances, multiple spark clusters, multiple Spark UIs and on top of that 
maintaining the security and privacy in a shared multi-tenant env will need all 
the flexibility we can get!

Thanks
Ankit

> On Feb 1, 2018, at 7:51 PM, Jeff Zhang <zjf...@gmail.com> wrote:
> 
> 
> Hi Jhon,
> 
> Do you mind to share what kind of custom function you want to add to spark 
> interpreter ? One idea in my mind is that we could add extension point to the 
> existing SparkInterpreter, and user can enhance SparkInterpreter via these 
> extension point. That means we just open some interfaces and users can 
> implement those interfaces, and just add their jars to spark interpreter 
> folder.
> 
> 
> 
> Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com>于2018年2月2日周五 上午5:30写道：
>> Hello!
>> 
>> I'm a software developer and as part of a project I require to extend the 
>> functionality of SparkInterpreter without modifying it. I need instead 
>> create a new interpreter that extends it or wrap its functionality.
>> 
>> I also need the spark sub-interpreters to use my new custom interpreter, but 
>> the problem comes here, because the spark sub-interpreters has a direct 
>> dependency to spark interpreter as they use the class name of spark 
>> interpreter to obtain its instance:
>> 
>> 
>>     private SparkInterpreter getSparkInterpreter() {
>>         ...
>>         Interpreter p = 
>> getInterpreterInTheSameSessionByClassName(SparkInterpreter.class.getName());
>>     }
>> 
>> 
>> Approach without modify apache zeppelin
>> 
>> My current approach to solve is to create a SparkCustomInterpreter that 
>> override the getClassName method as follows:
>> 
>> public class SparkCustomInterpreter extends SparkInterpreter {
>>     ...
>> 
>>     @Override
>>     public String getClassName() {
>>         return SparkInterpreter.class.getName();
>>     }
>> }
>> 
>> and put the new class name in the interpreter-setting.json file of spark:
>> 
>> [
>>   {
>>     "group": "spark",
>>     "name": "spark",
>>     "className": "org.apache.zeppelin.spark.SparkCustomInterpreter",
>>     ...
>>     "properties": {...}
>>   }, ...
>> ]
>> 
>> The problem with this approach is that when I run a paragraph it fails. In 
>> general it fails because zeppelin uses both the class name of the instance 
>> and the getClassName() method to access the instance, and that causes many 
>> problems.
>> 
>> Approaches modifying apache zeppelin
>> 
>> There are two possible solutions related with the way in which the 
>> sub-interpreters get the SparkInterpreter instance class, one is getting the 
>> class name from a property:
>> 
>> 
>>     private SparkInterpreter getSparkInterpreter() {
>>         ...
>>         Interpreter p = 
>> getInterpreterInTheSameSessionByClassName(property.getProperty("zeppelin.spark.mainClass",
>>  SparkInterpreter.class.getName()) );
>>     }
>> And the other possibility is to modify the method 
>> Interpreter.getInterpreterInTheSameSessionByClassName(String) in order to 
>> return the instance that whether has the same class name specified in the 
>> parameter or which super class has the same class name specified in the 
>> parameter:
>> 
>> 
>> @ZeppelinApi
>> public Interpreter getInterpreterInTheSameSessionByClassName(String 
>> className) {
>>   synchronized (interpreterGroup) {
>>     for (List<Interpreter> interpreters : interpreterGroup.values()) {
>>       ....
>>       for (Interpreter intp : interpreters) {
>>         if (intp.getClassName().equals(className) || 
>> intp.getClass().getSuperclass().getName().equals(className)) {
>>           interpreterFound = intp;
>>         }
>> 
>>         ...
>>       }
>> 
>>       ...
>>     }
>>   }
>>   return null;
>> }
>> 
>> Either of the two solutions would involve the modification of apache 
>> zeppelin code; do you think the change could be contributed to the 
>> community?, or maybe do you realize some other approach to change the way in 
>> which sub-interpreters of spark get the instance of spark interpreter?
>> 
>> Any information about it I'll be attempt.
>> 
>> Greetings
>> 
>> 
>> Jhon

Re: Extending SparkInterpreter functionality

Reply via email to