In order to use mmlspark in pyspark you need to set PYSPARK_DRIVER_PYTHON
property (either in interpreter settings of spark interpreter or
zeppelin-en.sh) to point to the python path where your mmlspark is
installed. This way you can access mmlspark in your driver, if you need to
access it in executors as well then you need to set PYSPARK_PYTHON and make
sure your python environment is deployed to every worker node. Hope this
helps.

On Sat, Jul 4, 2020 at 4:21 AM Chulmin Kwon (Jira) <[email protected]> wrote:

> Chulmin Kwon created ZEPPELIN-4937:
> --------------------------------------
>
>              Summary: mmlspark for lightgbm loading from zeppelin seems
> not working
>                  Key: ZEPPELIN-4937
>                  URL: https://issues.apache.org/jira/browse/ZEPPELIN-4937
>              Project: Zeppelin
>           Issue Type: Bug
>     Affects Versions: 0.9.0
>             Reporter: Chulmin Kwon
>              Fix For: 0.9.0
>
>
> Hello,
>
> I'm trying to import mmlspark packages for lightgbm from zeppelin.
>
> mmlspark works only for pyspark and it contains lightgbm module for
> machine learning.
>
> mmlspark can be installed by using and Standard pyspark works with this. ([
> https://github.com/Azure/mmlspark])
>
> pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
> --repositories=https://mmlspark.azureedge.net/maven
>
> or in $SPARK_HOME/conf/spark-default.conf file, setting as followings
>
> spark.jars.packages=com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
> spark.jars.repositories=https://mmlspark.azureedge.net/maven
>
>
>
> It works fine with standard pyspark console but zeppelin with %pyspark
> interpreter does not work with this. the mmlspark package cannot be
> imported in zeppelin.
>
> %pyspark
> import mmlspark
>
> -----------------------------
>
> Fail to execute line 1: import mmlspark
> Traceback (most recent call last):
>  File "/tmp/1593759774469-0/zeppelin_python.py", line 158, in <module>
>  exec(code, _zcUserQueryNameSpace)
>  File "<stdin>", line 1, in <module>
> ModuleNotFoundError: No module named 'mmlspark'
>
> ---------------------------------------------------
>
> Even, this code is not working
>
> ------------------------------------------
>
> %pyspark
>
> import pyspark
>
> spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
>  .config("spark.jars.packages",
> "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
>  .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven";)
> \
>  .getOrCreate()
> import mmlspark
>
> Fail to execute line 7: import mmlspark
> Traceback (most recent call last):
>  File "/tmp/1593759774469-0/zeppelin_python.py", line 158, in <module>
>  exec(code, _zcUserQueryNameSpace)
>  File "<stdin>", line 7, in <module>
> ModuleNotFoundError: No module named 'mmlspark'
>
>
>
> I googled and found out this package configuration should be set in
> Zeppelin interpreter parameters.
>
> I changed $ZEPPELIN_HOME/conf/zeppelin-env.sh to
>
> export SPARK_SUBMIT_OPTIONS="--packages
> com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 --repositories=
> https://mmlspark.azureedge.net/maven";
>
>
>
> I rebooted all processes with spark and zepplin after the setting. But it
> gives the same error.
>
> I'm not sure whose side is at fault, mmlspark packages itself or loading
> it from zeppelin.
>
> Can you help me with that ?
>
>
>
>
>
>
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>

Reply via email to