Hi Galaxy Dev,

I've been looking at the setup scripts for Galaxy to try to understand a
problem I recently had provisioning a Galaxy server.
I will readily admit that I have not read all of the relevant code
top-to-bottom, but I have at least skimmed all of it and read much of it.
Sorry if these questions are answered somewhere in Trello, the Wiki, or
somewhere else, but I was not able to find answers in any public locations.

As a small bit of probably irrelevant context:
I'm working with the Globus Genomics group on the DevOps side of things.
We're using Chef.
I've only just started working with the group in the past couple of weeks
(so my expertise with Galaxy itself is limited to nonexistent).


First, to describe the problem:


We want to provision a server running Galaxy without explicitly wrapping it
in a virtualenv.
Unless I missed something, that means that it's using system python.
When we use pip to install a package from a git repository before running
the setup scripts, fetch_eggs fails saying it failed to fetch WebError 0.8a
If we install the same package from git with `pip install --egg ...` we get
a hunky-dory system where everything seems to work.

As far as I can tell, there is no reason that this should be the case.
Sure, putting the git source directly into site-packages might cause issues
upon installation, but EggNotFetchable exceptions should only be thrown if
the egg actually can't be pulled down from eggs.g2.bx.psu.edu , right?
I don't feel comfortable trying to make further progress on my provisioning
scripts without knowing why this is happening.
I'd hate to be bitten by this later on in the process.

Yes, the package in question may have poor behavior (likely it does), but
that doesn't change the fact that the error is totally misleading.
Furthermore, it doesn't appear that this poor behavior impedes me from
doing a pip install of the WebError package or any other packages from PyPI.

In case someone else wants to test to replicate, this is the command being
used:
  pip install --egg git+
https://github.com/globusonline/python-nexus-client@599f04edef6b72569b7a5b272b0b847dcda3ea99#egg=nexus-client
problems occur if you omit `--egg`.


Second, a question about the rationale for Galaxy's egg handling:


Why is all of this wrapped up in these scripts in the first place?
I understand that pip might not be present on every platform, and I don't
mean to question a decision to support systems without it.
However, as detailed below, Galaxy does not support any platforms which are
incapable of running pip.
Furthermore, pip is being pushed by the Python maintainers over
easy_install, so it's not like there isn't a clear choice in terms of which
one to support.

Perhaps most importantly, there don't appear to be any clear-cut options to
do the following, which I would consider a more ordinary workflow:

- Run a galaxy script (like check_eggs) to generate a list of packages from
eggs.g2.bx.psu.edu for platform (redirect output to requirements.txt or
similar)
- `pip install -r requirements.txt`

The above could be part of a galaxy provisioning script, rather than
exposed to the administrator.

That also makes it significantly easier to control and manage the
virtualenv in which we run Galaxy, since we don't have to worry about
egg-related logic that we don't control and we know that the virtualenv's
bin dir will be earlier in the PATH than the system pip's dir.
Yes, I said above that our setup is presently using system python --
switching to a virtualenv is one of the many items on my to-do list.
In fact, I would expect that the default, desired setup for Galaxy would be
to put it in a virtualenv, rather than using system python, and to use pip,
rather than fetch_eggs.py and company.

When I look at the logic being used here, especially at
https://bitbucket.org/galaxy/galaxy-central/src/f0ae870b22e9/lib/galaxy/eggs/?at=default
, it looks like a solution built exclusively for platforms on which pip is
not installed.
According to the wiki, Galaxy support only goes back as far as 2.6, and
get-pip.py supports 2.6, so there is no way of building a Galaxy server on
a platform that can't also have pip installed.
Adding pip to the requirements for Galaxy would not be particularly
onerous, and may simplify things significantly (no need to bundle
get-pip.py or similar).

As a last note about the misleading error from the fetch_eggs script,
telling me that WebError is "NotFetchable".
I probably wouldn't have much of a complaint about this if the error had
been more on target and I hadn't felt the need to do things like patch
lib/galaxy/eggs to print a stacktrace.
For example, if the script detected that there was a source installed
package which was getting underfoot, it should have alerted me or even
suggested installing my various packages in egg format.
That kind of error detection is hard to maintain and hard to keep accurate
since the Galaxy team's priority is to build Galaxy, not a package manager.
This, I suppose, circles back to an earlier question: why isn't Galaxy
using a python package manager to... manage its packages?


At the very least something along the lines of `./scripts/common_setup.sh
--use-pip` should be added.
It doesn't seem like it would be that hard to implement -- but I feel like
my lack of knowledge of Galaxy disqualifies me from building the changeset
reliably.
Of course, if no one objects, I will readily do so anyway (when I have the
time) as a proof of concept.


Barring even that improvement, the Wiki page at
https://wiki.galaxyproject.org/Admin/Config/Eggs should definitely be
updated to include some note on why Galaxy has this complex logic for
fetching eggs instead of using pip.


A quick tl;dr and summary:
- I don't have extensive experience with Galaxy, so I may not know what I'm
talking about.
- fetch_eggs.py can be made to raise EggNotFetchable by doing a pip install
from a VCS without using `--egg`. This is a bug in fetch_eggs.py
- All supported platforms for Galaxy support pip
- Galaxy should have an option to use pip to download its packages over
https from eggs.g2.bx.psu.edu
- Galaxy should probably default to using pip when it's available, since
its failure modes are significantly better than a home-brewed package
manager -- this also leads to good behavior in virtualenvs.


Best regards, and many thanks for your time and attention,
-Stephen
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to