Chulmin Kwon created ZEPPELIN-4937:
--------------------------------------

             Summary: mmlspark for lightgbm loading from zeppelin seems not 
working
                 Key: ZEPPELIN-4937
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4937
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Chulmin Kwon
             Fix For: 0.9.0


Hello,

I'm trying to import mmlspark packages for lightgbm from zeppelin.

mmlspark works only for pyspark and it contains lightgbm module for machine 
learning.

mmlspark can be installed by using and Standard pyspark works with this. 
([https://github.com/Azure/mmlspark])

pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 
--repositories=https://mmlspark.azureedge.net/maven

or in $SPARK_HOME/conf/spark-default.conf file, setting as followings

spark.jars.packages=com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
spark.jars.repositories=https://mmlspark.azureedge.net/maven

 

It works fine with standard pyspark console but zeppelin with %pyspark 
interpreter does not work with this. the mmlspark package cannot be imported in 
zeppelin.

%pyspark
import mmlspark

-----------------------------

Fail to execute line 1: import mmlspark
Traceback (most recent call last):
 File "/tmp/1593759774469-0/zeppelin_python.py", line 158, in <module>
 exec(code, _zcUserQueryNameSpace)
 File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'mmlspark'

---------------------------------------------------

Even, this code is not working

------------------------------------------

%pyspark

import pyspark

spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
 .config("spark.jars.packages", 
"com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
 .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven";) \
 .getOrCreate()
import mmlspark

Fail to execute line 7: import mmlspark
Traceback (most recent call last):
 File "/tmp/1593759774469-0/zeppelin_python.py", line 158, in <module>
 exec(code, _zcUserQueryNameSpace)
 File "<stdin>", line 7, in <module>
ModuleNotFoundError: No module named 'mmlspark'

 

I googled and found out this package configuration should be set in Zeppelin 
interpreter parameters.

I changed $ZEPPELIN_HOME/conf/zeppelin-env.sh to

export SPARK_SUBMIT_OPTIONS="--packages 
com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 
--repositories=https://mmlspark.azureedge.net/maven";

 

I rebooted all processes with spark and zepplin after the setting. But it gives 
the same error.

I'm not sure whose side is at fault, mmlspark packages itself or loading it 
from zeppelin.

Can you help me with that ?

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to