On Mon, Dec 11, 2017 at 11:07 AM, Mana M <[email protected]> wrote:

> Hello,
>
> I am new to Spark + Jupyter and setting them up for our data analysis team.
> I had one question for which I cannot really find answer anywhere - hope
> someone can help here.
>
> I have setup multi-host Spark cluster and also have successfully installed
> Jupyter with Jupyter Hub. This setup will be shared among several data
> analysis team.
>
> The Spark cluster is setup with some common Python libraries. But each user
> may require additional libraries for their experimentation time to time. Is
> it possible for Jupyter user to install Python dependencies for her/his
> notebook, so dependencies are available on all Spark cluster nodes before
> user runs the notebook through Jupyter?
>
> I read about line magics (addDeps) in Apache toree, but I did not find any
> information on adding Python dependencies.
>
> Thanks,
> Mana
>

Toree does not provide any capabilities to manage the required dependencies
on remote execution nodes. Some approaches used in the industry community
are:

Anaconda or Anaconda Enterprise that enables you to build env that
are available/replicated in all nodes
Mapped user folders that can be used to host the necessary packages

I have also seen some discussions on Spark to better handle that, but I
don't believe this has been completely solved.


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Reply via email to