Chulmin Kwon created ZEPPELIN-4937:
--------------------------------------
Summary: mmlspark for lightgbm loading from zeppelin seems not
working
Key: ZEPPELIN-4937
URL: https://issues.apache.org/jira/browse/ZEPPELIN-4937
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Chulmin Kwon
Fix For: 0.9.0
Hello,
I'm trying to import mmlspark packages for lightgbm from zeppelin.
mmlspark works only for pyspark and it contains lightgbm module for machine
learning.
mmlspark can be installed by using and Standard pyspark works with this.
([https://github.com/Azure/mmlspark])
pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
--repositories=https://mmlspark.azureedge.net/maven
or in $SPARK_HOME/conf/spark-default.conf file, setting as followings
spark.jars.packages=com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
spark.jars.repositories=https://mmlspark.azureedge.net/maven
It works fine with standard pyspark console but zeppelin with %pyspark
interpreter does not work with this. the mmlspark package cannot be imported in
zeppelin.
%pyspark
import mmlspark
-----------------------------
Fail to execute line 1: import mmlspark
Traceback (most recent call last):
File "/tmp/1593759774469-0/zeppelin_python.py", line 158, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'mmlspark'
---------------------------------------------------
Even, this code is not working
------------------------------------------
%pyspark
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages",
"com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") \
.getOrCreate()
import mmlspark
Fail to execute line 7: import mmlspark
Traceback (most recent call last):
File "/tmp/1593759774469-0/zeppelin_python.py", line 158, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 7, in <module>
ModuleNotFoundError: No module named 'mmlspark'
I googled and found out this package configuration should be set in Zeppelin
interpreter parameters.
I changed $ZEPPELIN_HOME/conf/zeppelin-env.sh to
export SPARK_SUBMIT_OPTIONS="--packages
com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
--repositories=https://mmlspark.azureedge.net/maven"
I rebooted all processes with spark and zepplin after the setting. But it gives
the same error.
I'm not sure whose side is at fault, mmlspark packages itself or loading it
from zeppelin.
Can you help me with that ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)