On Mon, Dec 11, 2017 at 11:07 AM, Mana M <[email protected]> wrote:
> Hello, > > I am new to Spark + Jupyter and setting them up for our data analysis team. > I had one question for which I cannot really find answer anywhere - hope > someone can help here. > > I have setup multi-host Spark cluster and also have successfully installed > Jupyter with Jupyter Hub. This setup will be shared among several data > analysis team. > > The Spark cluster is setup with some common Python libraries. But each user > may require additional libraries for their experimentation time to time. Is > it possible for Jupyter user to install Python dependencies for her/his > notebook, so dependencies are available on all Spark cluster nodes before > user runs the notebook through Jupyter? > > I read about line magics (addDeps) in Apache toree, but I did not find any > information on adding Python dependencies. > > Thanks, > Mana > Toree does not provide any capabilities to manage the required dependencies on remote execution nodes. Some approaches used in the industry community are: Anaconda or Anaconda Enterprise that enables you to build env that are available/replicated in all nodes Mapped user folders that can be used to host the necessary packages I have also seen some discussions on Spark to better handle that, but I don't believe this has been completely solved. -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/
