Hi Nate,

Thanks so much for helping me out with this.

It seems that I've miscommunicated what I'm doing a little bit.
I'm installing a package from that git repo using pip separately from
running the galaxy setup script.
That is, I want to install a python package and I want to setup galaxy as
two separate (and ideally independent) steps in my script.
I have no desire to do anything particularly esoteric or clever with
Galaxy's eggs or to install them myself.
Somehow, doing a pip install of my desired package is breaking the later
run of fetch_eggs.py

If I understand you correctly, fetch_eggs.py can hit conflicts with
site-packages, but is misreporting them as EggNotFetchable.
I am aware that we're using a fork of the Galaxy source (
https://bitbucket.org/galaxycloud/galaxy/src ), so it may be that we have
an outdated of fetch_eggs.py which carries a bug that has since been
Sorry for not mentioning it earlier -- it slipped my mind that we're
probably trailing behind the modern Galaxy default head.

If I put in the extra legwork now to wrap Galaxy in a virtualenv, this
issue will presumably disappear, but that assumes that I don't want to add
any packages to the python used for the uwsgi process.
I'll have to ask folks on the Globus Genomics team whether or not that is
needed -- I certainly hope not, since it means that there is some packaging
conflict we need to resolve.
The only reason I haven't done so already is time constraint -- I've been
trying to get this install scripting done as quickly as possible without
compromising anything truly needful.

All the same, I'll take some time tomorrow to provision a fresh server and
generate the stacktrace for you.
I'll also take some time to test with the latest Galaxy source, to see if I
get different behavior.
In the best case, this bug no longer exists for the modern source, but in
the worst case it bears the attention.

I don't have any sense of the reliability of fetch_eggs.py and company --
my experience has been of a particular bug that presented on my first
contact with these scripts, so naturally I have a bias against them.
That said, I think the Python community may have finally settled on a
package manager, even if we can't seem to agree on a package format or
tooling surrounding it.
That's just an opinion though -- I have no blog post from Guido to back it

I was not aware of the UCS2 vs. UCS4 issue -- thanks very much for the
citations, very helpful in understanding the problem space.


On Wed, Feb 4, 2015 at 5:41 PM, Nate Coraor <n...@bx.psu.edu> wrote:

> Hi Stephen,
> I'll try to reply as in-depth as I can.
> On Wed, Feb 4, 2015 at 1:41 PM, Stephen Rosen <siro...@uchicago.edu>
> wrote:
>> Hi Galaxy Dev,
>> I've been looking at the setup scripts for Galaxy to try to understand a
>> problem I recently had provisioning a Galaxy server.
>> I will readily admit that I have not read all of the relevant code
>> top-to-bottom, but I have at least skimmed all of it and read much of it.
>> Sorry if these questions are answered somewhere in Trello, the Wiki, or
>> somewhere else, but I was not able to find answers in any public locations.
>> As a small bit of probably irrelevant context:
>> I'm working with the Globus Genomics group on the DevOps side of things.
>> We're using Chef.
>> I've only just started working with the group in the past couple of weeks
>> (so my expertise with Galaxy itself is limited to nonexistent).
>> First, to describe the problem:
>> We want to provision a server running Galaxy without explicitly wrapping
>> it in a virtualenv.
>> Unless I missed something, that means that it's using system python.
>> When we use pip to install a package from a git repository before running
>> the setup scripts, fetch_eggs fails saying it failed to fetch WebError 0.8a
>> If we install the same package from git with `pip install --egg ...` we
>> get a hunky-dory system where everything seems to work.
> A virtualenv is itself just a wrapper around whichever python binary was
> used to create it. I'd still suggest using a virtualenv created with the
> system python unless you have a really strong reason not to. In fact, I'm
> working on Galaxy process management and a command line tool for that
> management that will automatically create and use a virtualenv going
> forward.
> I'm a bit confused at what's happening here - you mention installing a
> package from a git repository with pip, but then refer to Galaxy's
> fetch_eggs(.py) script, which doesn't use pip or git.
>> As far as I can tell, there is no reason that this should be the case.
>> Sure, putting the git source directly into site-packages might cause
>> issues upon installation, but EggNotFetchable exceptions should only be
>> thrown if the egg actually can't be pulled down from eggs.g2.bx.psu.edu
>> , right?
> EggNotFetchable can be thrown if you happen to be using a platform for
> which we do not provide eggs, although those are fairly uncommon. Right now
> we should cover x86/x86_64 Linux and any flavor of Intel OS X after 10.5.
>> I don't feel comfortable trying to make further progress on my
>> provisioning scripts without knowing why this is happening.
>> I'd hate to be bitten by this later on in the process.
>> Yes, the package in question may have poor behavior (likely it does), but
>> that doesn't change the fact that the error is totally misleading.
>> Furthermore, it doesn't appear that this poor behavior impedes me from
>> doing a pip install of the WebError package or any other packages from PyPI.
>> In case someone else wants to test to replicate, this is the command
>> being used:
>>   pip install --egg git+
>> https://github.com/globusonline/python-nexus-client@599f04edef6b72569b7a5b272b0b847dcda3ea99#egg=nexus-client
>> problems occur if you omit `--egg`.
> None of this process is using Galaxy's egg handling, so I am not sure
> where the EggNotFetchable is coming from. What command are you running to
> get an EggNotFetchable error.
>> Second, a question about the rationale for Galaxy's egg handling:
>> Why is all of this wrapped up in these scripts in the first place?
>> I understand that pip might not be present on every platform, and I don't
>> mean to question a decision to support systems without it.
>> However, as detailed below, Galaxy does not support any platforms which
>> are incapable of running pip.
> This isn't the case - Galaxy does not use pip to install the framework
> dependencies at all. Some tool dependencies installed from the Tool Shed do
> use pip, but that's entirely separate from the dependencies of the Galaxy
> application.
> The `scripts/scramble.py` script can be used to automatically build eggs
> on platforms which we do not prebuild eggs for. If this is necessary,
> `scripts/fetch_eggs.py` should tell you.
>> Furthermore, pip is being pushed by the Python maintainers over
>> easy_install, so it's not like there isn't a clear choice in terms of which
>> one to support.
>> Perhaps most importantly, there don't appear to be any clear-cut options
>> to do the following, which I would consider a more ordinary workflow:
>> - Run a galaxy script (like check_eggs) to generate a list of packages
>> from eggs.g2.bx.psu.edu for platform (redirect output to
>> requirements.txt or similar)
>> - `pip install -r requirements.txt`
> This is exactly what `scripts/check_eggs.py` and `scripts/fetch_eggs.py`
> do.
> There are 3 reasons for the way we handle eggs in Galaxy:
> 1. Galaxy has a huge (and ever-growing) list of dependent python modules
> with C extensions. If we did not prebuild and distribute eggs for these,
> the initial setup to get Galaxy running would be long and problematic. Some
> people who download Galaxy to develop tools may not even have compilers
> installed, let alone the multitude of -dev or -devel packages that aren't
> part of a default Debian or RHEL installation that would be required to
> build all of these packages from source. One of the things that I feel
> makes Galaxy so accessible is that you can start using it immediately after
> you clone the source. So that ability to clone and start and have it work
> as reliably (and quickly) as possible is a high priority.
> 2. Galaxy started using eggs in 2005 or 2006. At this time, everything
> used distutils. pkg_resources came around, which soon brought setuptools
> and easy_install. After this came distutils2, pip and finally, these days,
> wheels. Our need for binary dependency packaging predated almost all of
> these (in fact, most packages in these days didn't even install .egg-info,
> which was the only reliable way to know what version of a module you were
> using) and as each new iteration of packaging/management came along it was
> never clear that any of them had "won" (and in fact, most of them lost). On
> top of this, the Python packaging folks have known for years that Python's
> platform detection for binary compatibility is broken[1]. While I was
> assured it'd be fixed soon, even with a complete reimplementation of Python
> packaging (wheels), they still haven't even made an effort to fix this
> problem[2]. In fact, binary wheels for Linux are explicitly not allowed on
> PyPI because of this.
> 3. We tightly control the versions of all of our dependencies, which is
> not always possible with pip if you aren't also controlling the source of
> your packages.
> [1]
> https://mail.python.org/pipermail/distutils-sig/2010-January/015345.html
> [2] http://lucumr.pocoo.org/2014/1/27/python-on-wheels/
> The above could be part of a galaxy provisioning script, rather than
>> exposed to the administrator.
>> That also makes it significantly easier to control and manage the
>> virtualenv in which we run Galaxy, since we don't have to worry about
>> egg-related logic that we don't control and we know that the virtualenv's
>> bin dir will be earlier in the PATH than the system pip's dir.
>> Yes, I said above that our setup is presently using system python --
>> switching to a virtualenv is one of the many items on my to-do list.
>> In fact, I would expect that the default, desired setup for Galaxy would
>> be to put it in a virtualenv, rather than using system python, and to use
>> pip, rather than fetch_eggs.py and company.
> A virtualenv is indeed the strongly preferred setup as I mentioned above.
> However, Galaxy does not install its eggs to the virtualenv. The virtualenv
> is there to avoid conflicts with things in the default python's
> site-packages/dist-packages. Galaxy's eggs are installed to (by default)
> the `eggs/` directory in the Galaxy source.
> However, the problem, as I now see it, is that you are trying to install
> all of Galaxy's dependencies, even at their correct versions, using pip,
> rather than letting Galaxy handle its eggs as it does. This is not going to
> work, Galaxy is going to insist on using its eggs.
>> When I look at the logic being used here, especially at
>> https://bitbucket.org/galaxy/galaxy-central/src/f0ae870b22e9/lib/galaxy/eggs/?at=default
>> , it looks like a solution built exclusively for platforms on which pip is
>> not installed.
>> According to the wiki, Galaxy support only goes back as far as 2.6, and
>> get-pip.py supports 2.6, so there is no way of building a Galaxy server on
>> a platform that can't also have pip installed.
>> Adding pip to the requirements for Galaxy would not be particularly
>> onerous, and may simplify things significantly (no need to bundle
>> get-pip.py or similar).
> As mentioned above, we don't use or depend on pip, so it's not required.
> That said, a lot of our egg fetching logic could likely be replaced with
> pip (this code predates pip by a few years). And the eggs could probably be
> replaced with prebuilt wheels. However, even if we did use pip/wheels, we'd
> need to install them from our own repository, and it'd still require
> modifications for binary platform incompatibilities. The egg handling code
> we have now works pretty reliably, so I am not sure there is a whole lot to
> be gained by changing it until Python finally figures out how to handle
> binary compatibility properly.
>> As a last note about the misleading error from the fetch_eggs script,
>> telling me that WebError is "NotFetchable".
>> I probably wouldn't have much of a complaint about this if the error had
>> been more on target and I hadn't felt the need to do things like patch
>> lib/galaxy/eggs to print a stacktrace.
>> For example, if the script detected that there was a source installed
>> package which was getting underfoot, it should have alerted me or even
>> suggested installing my various packages in egg format.
> This is a bug - it is supposed to explain that there is a version
> conflict. If you have a stack trace, please send it along.
>> That kind of error detection is hard to maintain and hard to keep
>> accurate since the Galaxy team's priority is to build Galaxy, not a package
>> manager.
>> This, I suppose, circles back to an earlier question: why isn't Galaxy
>> using a python package manager to... manage its packages?
>> At the very least something along the lines of `./scripts/common_setup.sh
>> --use-pip` should be added.
>> It doesn't seem like it would be that hard to implement -- but I feel
>> like my lack of knowledge of Galaxy disqualifies me from building the
>> changeset reliably.
>> Of course, if no one objects, I will readily do so anyway (when I have
>> the time) as a proof of concept.
> There are modifications to a few of the dependencies, such as psycopg2.
> When psycopg2 is scrambled, the scrambling process fetches and compiles a
> bit of PostgreSQL's libpq and then statically links to it to provide a
> standalone egg that does not depend on the user having installed libpq on
> their system. So a pip-based system that allowed building from source would
> need to account for this.
> --nate
>> Barring even that improvement, the Wiki page at
>> https://wiki.galaxyproject.org/Admin/Config/Eggs should definitely be
>> updated to include some note on why Galaxy has this complex logic for
>> fetching eggs instead of using pip.
>> A quick tl;dr and summary:
>> - I don't have extensive experience with Galaxy, so I may not know what
>> I'm talking about.
>> - fetch_eggs.py can be made to raise EggNotFetchable by doing a pip
>> install from a VCS without using `--egg`. This is a bug in fetch_eggs.py
>> - All supported platforms for Galaxy support pip
>> - Galaxy should have an option to use pip to download its packages over
>> https from eggs.g2.bx.psu.edu
>> - Galaxy should probably default to using pip when it's available, since
>> its failure modes are significantly better than a home-brewed package
>> manager -- this also leads to good behavior in virtualenvs.
>> Best regards, and many thanks for your time and attention,
>> -Stephen
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   https://lists.galaxyproject.org/
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to