Hi guys,

the latest sync of your fork with Galaxy stable was on 2013-02-08, according

This is the diff of fetch_eggs.py in between your version and
It doesn't look like there is something that may be affecting the problem.


On Wed Feb 04 2015 at 8:31:08 PM Stephen Rosen <siro...@uchicago.edu> wrote:

> Hi Nate,
> Thanks so much for helping me out with this.
> It seems that I've miscommunicated what I'm doing a little bit.
> I'm installing a package from that git repo using pip separately from
> running the galaxy setup script.
> That is, I want to install a python package and I want to setup galaxy as
> two separate (and ideally independent) steps in my script.
> I have no desire to do anything particularly esoteric or clever with
> Galaxy's eggs or to install them myself.
> Somehow, doing a pip install of my desired package is breaking the later
> run of fetch_eggs.py
> If I understand you correctly, fetch_eggs.py can hit conflicts with
> site-packages, but is misreporting them as EggNotFetchable.
> I am aware that we're using a fork of the Galaxy source (
> https://bitbucket.org/galaxycloud/galaxy/src ), so it may be that we have
> an outdated of fetch_eggs.py which carries a bug that has since been
> patched.
> Sorry for not mentioning it earlier -- it slipped my mind that we're
> probably trailing behind the modern Galaxy default head.
> If I put in the extra legwork now to wrap Galaxy in a virtualenv, this
> issue will presumably disappear, but that assumes that I don't want to add
> any packages to the python used for the uwsgi process.
> I'll have to ask folks on the Globus Genomics team whether or not that is
> needed -- I certainly hope not, since it means that there is some packaging
> conflict we need to resolve.
> The only reason I haven't done so already is time constraint -- I've been
> trying to get this install scripting done as quickly as possible without
> compromising anything truly needful.
> All the same, I'll take some time tomorrow to provision a fresh server and
> generate the stacktrace for you.
> I'll also take some time to test with the latest Galaxy source, to see if
> I get different behavior.
> In the best case, this bug no longer exists for the modern source, but in
> the worst case it bears the attention.
> I don't have any sense of the reliability of fetch_eggs.py and company --
> my experience has been of a particular bug that presented on my first
> contact with these scripts, so naturally I have a bias against them.
> That said, I think the Python community may have finally settled on a
> package manager, even if we can't seem to agree on a package format or
> tooling surrounding it.
> That's just an opinion though -- I have no blog post from Guido to back it
> up.
> I was not aware of the UCS2 vs. UCS4 issue -- thanks very much for the
> citations, very helpful in understanding the problem space.
> Thanks,
> -Stephen
> On Wed, Feb 4, 2015 at 5:41 PM, Nate Coraor <n...@bx.psu.edu> wrote:
>> Hi Stephen,
>> I'll try to reply as in-depth as I can.
>> On Wed, Feb 4, 2015 at 1:41 PM, Stephen Rosen <siro...@uchicago.edu>
>> wrote:
>>> Hi Galaxy Dev,
>>> I've been looking at the setup scripts for Galaxy to try to understand a
>>> problem I recently had provisioning a Galaxy server.
>>> I will readily admit that I have not read all of the relevant code
>>> top-to-bottom, but I have at least skimmed all of it and read much of it.
>>> Sorry if these questions are answered somewhere in Trello, the Wiki, or
>>> somewhere else, but I was not able to find answers in any public locations.
>>> As a small bit of probably irrelevant context:
>>> I'm working with the Globus Genomics group on the DevOps side of things.
>>> We're using Chef.
>>> I've only just started working with the group in the past couple of
>>> weeks (so my expertise with Galaxy itself is limited to nonexistent).
>>> First, to describe the problem:
>>> We want to provision a server running Galaxy without explicitly wrapping
>>> it in a virtualenv.
>>> Unless I missed something, that means that it's using system python.
>>> When we use pip to install a package from a git repository before
>>> running the setup scripts, fetch_eggs fails saying it failed to fetch
>>> WebError 0.8a
>>> If we install the same package from git with `pip install --egg ...` we
>>> get a hunky-dory system where everything seems to work.
>> A virtualenv is itself just a wrapper around whichever python binary was
>> used to create it. I'd still suggest using a virtualenv created with the
>> system python unless you have a really strong reason not to. In fact, I'm
>> working on Galaxy process management and a command line tool for that
>> management that will automatically create and use a virtualenv going
>> forward.
>> I'm a bit confused at what's happening here - you mention installing a
>> package from a git repository with pip, but then refer to Galaxy's
>> fetch_eggs(.py) script, which doesn't use pip or git.
>>> As far as I can tell, there is no reason that this should be the case.
>>> Sure, putting the git source directly into site-packages might cause
>>> issues upon installation, but EggNotFetchable exceptions should only be
>>> thrown if the egg actually can't be pulled down from eggs.g2.bx.psu.edu
>>> , right?
>> EggNotFetchable can be thrown if you happen to be using a platform for
>> which we do not provide eggs, although those are fairly uncommon. Right now
>> we should cover x86/x86_64 Linux and any flavor of Intel OS X after 10.5.
>>> I don't feel comfortable trying to make further progress on my
>>> provisioning scripts without knowing why this is happening.
>>> I'd hate to be bitten by this later on in the process.
>>> Yes, the package in question may have poor behavior (likely it does),
>>> but that doesn't change the fact that the error is totally misleading.
>>> Furthermore, it doesn't appear that this poor behavior impedes me from
>>> doing a pip install of the WebError package or any other packages from PyPI.
>>> In case someone else wants to test to replicate, this is the command
>>> being used:
>>>   pip install --egg git+
>>> https://github.com/globusonline/python-nexus-client@599f04edef6b72569b7a5b272b0b847dcda3ea99#egg=nexus-client
>>> problems occur if you omit `--egg`.
>> None of this process is using Galaxy's egg handling, so I am not sure
>> where the EggNotFetchable is coming from. What command are you running to
>> get an EggNotFetchable error.
>>> Second, a question about the rationale for Galaxy's egg handling:
>>> Why is all of this wrapped up in these scripts in the first place?
>>> I understand that pip might not be present on every platform, and I
>>> don't mean to question a decision to support systems without it.
>>> However, as detailed below, Galaxy does not support any platforms which
>>> are incapable of running pip.
>> This isn't the case - Galaxy does not use pip to install the framework
>> dependencies at all. Some tool dependencies installed from the Tool Shed do
>> use pip, but that's entirely separate from the dependencies of the Galaxy
>> application.
>> The `scripts/scramble.py` script can be used to automatically build eggs
>> on platforms which we do not prebuild eggs for. If this is necessary,
>> `scripts/fetch_eggs.py` should tell you.
>>> Furthermore, pip is being pushed by the Python maintainers over
>>> easy_install, so it's not like there isn't a clear choice in terms of which
>>> one to support.
>>> Perhaps most importantly, there don't appear to be any clear-cut options
>>> to do the following, which I would consider a more ordinary workflow:
>>> - Run a galaxy script (like check_eggs) to generate a list of packages
>>> from eggs.g2.bx.psu.edu for platform (redirect output to
>>> requirements.txt or similar)
>>> - `pip install -r requirements.txt`
>> This is exactly what `scripts/check_eggs.py` and `scripts/fetch_eggs.py`
>> do.
>> There are 3 reasons for the way we handle eggs in Galaxy:
>> 1. Galaxy has a huge (and ever-growing) list of dependent python modules
>> with C extensions. If we did not prebuild and distribute eggs for these,
>> the initial setup to get Galaxy running would be long and problematic. Some
>> people who download Galaxy to develop tools may not even have compilers
>> installed, let alone the multitude of -dev or -devel packages that aren't
>> part of a default Debian or RHEL installation that would be required to
>> build all of these packages from source. One of the things that I feel
>> makes Galaxy so accessible is that you can start using it immediately after
>> you clone the source. So that ability to clone and start and have it work
>> as reliably (and quickly) as possible is a high priority.
>> 2. Galaxy started using eggs in 2005 or 2006. At this time, everything
>> used distutils. pkg_resources came around, which soon brought setuptools
>> and easy_install. After this came distutils2, pip and finally, these days,
>> wheels. Our need for binary dependency packaging predated almost all of
>> these (in fact, most packages in these days didn't even install .egg-info,
>> which was the only reliable way to know what version of a module you were
>> using) and as each new iteration of packaging/management came along it was
>> never clear that any of them had "won" (and in fact, most of them lost). On
>> top of this, the Python packaging folks have known for years that Python's
>> platform detection for binary compatibility is broken[1]. While I was
>> assured it'd be fixed soon, even with a complete reimplementation of Python
>> packaging (wheels), they still haven't even made an effort to fix this
>> problem[2]. In fact, binary wheels for Linux are explicitly not allowed on
>> PyPI because of this.
>> 3. We tightly control the versions of all of our dependencies, which is
>> not always possible with pip if you aren't also controlling the source of
>> your packages.
>> [1]
>> https://mail.python.org/pipermail/distutils-sig/2010-January/015345.html
>> [2] http://lucumr.pocoo.org/2014/1/27/python-on-wheels/
>> The above could be part of a galaxy provisioning script, rather than
>>> exposed to the administrator.
>>> That also makes it significantly easier to control and manage the
>>> virtualenv in which we run Galaxy, since we don't have to worry about
>>> egg-related logic that we don't control and we know that the virtualenv's
>>> bin dir will be earlier in the PATH than the system pip's dir.
>>> Yes, I said above that our setup is presently using system python --
>>> switching to a virtualenv is one of the many items on my to-do list.
>>> In fact, I would expect that the default, desired setup for Galaxy would
>>> be to put it in a virtualenv, rather than using system python, and to use
>>> pip, rather than fetch_eggs.py and company.
>> A virtualenv is indeed the strongly preferred setup as I mentioned above.
>> However, Galaxy does not install its eggs to the virtualenv. The virtualenv
>> is there to avoid conflicts with things in the default python's
>> site-packages/dist-packages. Galaxy's eggs are installed to (by default)
>> the `eggs/` directory in the Galaxy source.
>> However, the problem, as I now see it, is that you are trying to install
>> all of Galaxy's dependencies, even at their correct versions, using pip,
>> rather than letting Galaxy handle its eggs as it does. This is not going to
>> work, Galaxy is going to insist on using its eggs.
>>> When I look at the logic being used here, especially at
>>> https://bitbucket.org/galaxy/galaxy-central/src/f0ae870b22e9/lib/galaxy/eggs/?at=default
>>> , it looks like a solution built exclusively for platforms on which pip is
>>> not installed.
>>> According to the wiki, Galaxy support only goes back as far as 2.6, and
>>> get-pip.py supports 2.6, so there is no way of building a Galaxy server on
>>> a platform that can't also have pip installed.
>>> Adding pip to the requirements for Galaxy would not be particularly
>>> onerous, and may simplify things significantly (no need to bundle
>>> get-pip.py or similar).
>> As mentioned above, we don't use or depend on pip, so it's not required.
>> That said, a lot of our egg fetching logic could likely be replaced with
>> pip (this code predates pip by a few years). And the eggs could probably be
>> replaced with prebuilt wheels. However, even if we did use pip/wheels, we'd
>> need to install them from our own repository, and it'd still require
>> modifications for binary platform incompatibilities. The egg handling code
>> we have now works pretty reliably, so I am not sure there is a whole lot to
>> be gained by changing it until Python finally figures out how to handle
>> binary compatibility properly.
>>> As a last note about the misleading error from the fetch_eggs script,
>>> telling me that WebError is "NotFetchable".
>>> I probably wouldn't have much of a complaint about this if the error had
>>> been more on target and I hadn't felt the need to do things like patch
>>> lib/galaxy/eggs to print a stacktrace.
>>> For example, if the script detected that there was a source installed
>>> package which was getting underfoot, it should have alerted me or even
>>> suggested installing my various packages in egg format.
>> This is a bug - it is supposed to explain that there is a version
>> conflict. If you have a stack trace, please send it along.
>>> That kind of error detection is hard to maintain and hard to keep
>>> accurate since the Galaxy team's priority is to build Galaxy, not a package
>>> manager.
>>> This, I suppose, circles back to an earlier question: why isn't Galaxy
>>> using a python package manager to... manage its packages?
>>> At the very least something along the lines of
>>> `./scripts/common_setup.sh --use-pip` should be added.
>>> It doesn't seem like it would be that hard to implement -- but I feel
>>> like my lack of knowledge of Galaxy disqualifies me from building the
>>> changeset reliably.
>>> Of course, if no one objects, I will readily do so anyway (when I have
>>> the time) as a proof of concept.
>> There are modifications to a few of the dependencies, such as psycopg2.
>> When psycopg2 is scrambled, the scrambling process fetches and compiles a
>> bit of PostgreSQL's libpq and then statically links to it to provide a
>> standalone egg that does not depend on the user having installed libpq on
>> their system. So a pip-based system that allowed building from source would
>> need to account for this.
>> --nate
>>> Barring even that improvement, the Wiki page at
>>> https://wiki.galaxyproject.org/Admin/Config/Eggs should definitely be
>>> updated to include some note on why Galaxy has this complex logic for
>>> fetching eggs instead of using pip.
>>> A quick tl;dr and summary:
>>> - I don't have extensive experience with Galaxy, so I may not know what
>>> I'm talking about.
>>> - fetch_eggs.py can be made to raise EggNotFetchable by doing a pip
>>> install from a VCS without using `--egg`. This is a bug in fetch_eggs.py
>>> - All supported platforms for Galaxy support pip
>>> - Galaxy should have an option to use pip to download its packages over
>>> https from eggs.g2.bx.psu.edu
>>> - Galaxy should probably default to using pip when it's available, since
>>> its failure modes are significantly better than a home-brewed package
>>> manager -- this also leads to good behavior in virtualenvs.
>>> Best regards, and many thanks for your time and attention,
>>> -Stephen
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>   https://lists.galaxyproject.org/
>>> To search Galaxy mailing lists use the unified search at:
>>>   http://galaxyproject.org/search/mailinglists/
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to