Hi John,

On Sep 26, 2013, at 9:15 PM, John Chilton <chil...@msi.umn.edu> wrote:

> I was not even thinking we needed to modify the tool shed to implement
> this. I was hoping (?) you could just modify:

Nothing in the Tool Shed itself would be affected or require modification for 
this new feature as this is completely on the Galaxy side.

> 
> lib/galaxy/tools/deps/__init__.py
> 
> to implement this. If some tool contains the tag
> 
>  <requirement type="package" version="1.7.1">numpy</requirement>
> 
> then if there is a manually installed tool_dependency in
> `tool_dependency_dir/numpy/1.7.1/env.sh` that would take precedence
> over the tool shed installed version (would that be something like
> `tool_dependency_dir/numpy/1.7.1/owner/name/changeset/env.sh`)? Let me
> know if this is way off base.

This is a possibility perhaps, but there seems to be a potential weakness in 
that it doesn't require the Tool Dependency object to exist since the tool will 
function without a installed dependency from the Tool Shed.  Or, if the 
installed depndency s required, then it is meaningless because it won't be 
used.  If the former case, then the tool dependency cannot be shared via the 
Tool Shed's dependency mechanism because none of the relationships will be 
defined  since nothing is installed.  Wouldn't it be better to allow the Galaxy 
admin to point the ToolDependency object to a specified binary on disk?  In 
this way, all relationships defined by Tool Shed installations will work as 
expected, with all contained tools with that dependency point to that same 
shared location on disk.

> 
> There is a lot you could do to make this more complicated of course -
> an interface for mapping exact tool shed dependencies to manually
> installed ones, the ability to auto-compile tool shed dependencies
> against manually installed libraries, etc..., but I am not sure those
> complexities are buying you anything really.

This is certainly a debatable topic, but I'm not seeing how my approach creates 
more complexity.  The Galaxy admin is required to manually compile the binary 
dependency in either case.  I'm just providing him an easy UI feature to enable 
a ToolDependency object that can be shared by any number of tools contained in 
any number of repositories installed from the Tool Shed to locate it.  Using 
this approach, the Galaxy admin can either choose to install thje dependency 
from the Tool Shed or manually compile hte dependency and have the 
ToolDependency object point to it.  In either case,all Tool Shed dependency 
definitions (both repository and tool depndencies) would all work as expected 
with additional repository installs.

Greg Von Kuster


> 
> Thoughts?
> 
> -John
> 
> 
> 
> On Thu, Sep 26, 2013 at 5:47 PM, Greg Von Kuster <g...@bx.psu.edu> wrote:
>> Hi John,
>> 
>> On Sep 26, 2013, at 5:27 PM, John Chilton <chil...@msi.umn.edu> wrote:
>> 
>>> My recommendation would be make the tool dependency install work on as
>>> many platforms as you can and not try to optimize in such a way that
>>> it is not going to work - i.e. favor reproduciblity over performance.
>>> If a system administrator or institution want to sacrifice
>>> reproduciblity and optimize specific packages they should be able to
>>> do so manually. Its not just Atlas and CPU throttling right? Its
>>> vendor versions of MPI, GPGPU variants of code, variants of OpenMP,
>>> etc....  Even if the tool shed provided some mechanism for determining
>>> if some particular package optimization is going to work, perhaps its
>>> better to just not enable it by default because frequently these cause
>>> slightly different results than the unoptimized version.
>>> 
>>> The problem with this recommendation that is Galaxy currently provides
>>> no mechanism for doing so. Luckily this is easy to solve and the
>>> solution solves other problems. If the tool dependency resolution code
>>> would grab the manually configured dependency instead of the tool shed
>>> variant when available, instead of favoring the opposite, then it
>>> would be really easy to add in an optimized version of numpy or an MPI
>>> version of software X.
>> 
>> How would you like this to happen?  Would it work to provide an admin the 
>> ability to create a ToolDependency object and point it to a "manually 
>> configured dependency" in whatever location on disk the admin chooses via a 
>> new UI feature?  Or do you have a different idea?
>> 
>> Thanks,
>> 
>> Greg Von Kuster
>> 
>> 
>>> 
>>> Whats great is this solves other problems as well. For instance, our
>>> genomics Galaxy web server runs Debian but the worker nodes run
>>> CentOS. This means many tool shed installed dependencies do not work.
>>> JJ being the patient guy he is goes in and manually updates the tool
>>> shed installed env.sh files to load modules. Even if you think not
>>> running the same version of the OS on your server and worker nodes is
>>> a bit crazy, there is the much more reasonable (common) case of just
>>> wanting to submit to multiple different clusters. When I was talking
>>> with the guys at NCGAS they were unsure how to do this, this one
>>> change would make that a lot more tenable.
>>> 
>>> -John
>>> 
>>> On Thu, Sep 26, 2013 at 1:29 PM, Björn Grüning
>>> <bjoern.gruen...@pharmazie.uni-freiburg.de> wrote:
>>>> Hi,
>>>> 
>>>>> Hi Bjoern,
>>>>> 
>>>>> Is there anything else we (the Galaxy community) can do to help
>>>>> sort out the ATLAS installation problems?
>>>> 
>>>> Thanks for asking. I have indeed a few things I would like some
>>>> comments.
>>>> 
>>>>> Another choice might be to use OpenBLAS instead of ATLAS, e.g.
>>>>> http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-integration
>>>> 
>>>> I have no experience with it. Does it also need to turn off CPU
>>>> throttling? I would assume so, or how is it optimizing itself?
>>>> 
>>>>> However, I think we build NumPy without using ATLAS or any
>>>>> BLAS library. That seems like the most pragmatic solution
>>>>> in the short term - which I think is what Dan tried here:
>>>>> http://testtoolshed.g2.bx.psu.edu/view/blankenberg/package_numpy_1_7
>>>> 
>>>> I can remove them if that is the consensus.
>>>> 
>>>> A few points:
>>>> - fixing the atlas issue can speed up numpy, scipy, R considerably (by
>>>> 400% in some cases)
>>>> - as far as I understand that performance gain is due to optimizing
>>>> itself on specific hardware, for atlas there is no way around to disable
>>>> CPU throttling (How about OpenBlas?)
>>>> - it seems to be complicated to deactivate CPU throttling on OS-X
>>>> - binary installation does not make sense in that case, because ATLAS is
>>>> self optimizing
>>>> - Distribution shipped ATLAS packages are not really faster
>>>> 
>>>> Current state:
>>>> - Atlas tries two different commands to deactivate CPU throttling. Afaik
>>>> that only works on some Ubuntu versions, where no root privileges are
>>>> necessary.
>>>> - If atlas fails for some reason, numpy/R/scipy installation should not
>>>> be affected (that's was at least the aim)
>>>> 
>>>> Questions:
>>>> - Is it worth the hassle for some speed improvements? pip install numpy,
>>>> would be so easy?
>>>> 
>>>> - If we want to support ATLAS, any better idea to how to implement it?
>>>> Any Tool Shed feature that can help? -> interactive installation?
>>>>      - can we flag a tool dependency as optional? So it can fail?
>>>> 
>>>> - Can anyone help with testing and fixing it?
>>>> 
>>>> 
>>>> Any opinions/comments?
>>>> Bjoern
>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Peter
>>>> 
>>>> 
>>>> 
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client.  To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>> http://lists.bx.psu.edu/
>>>> 
>>>> To search Galaxy mailing lists use the unified search at:
>>>> http://galaxyproject.org/search/mailinglists/
>>> 
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>> http://lists.bx.psu.edu/
>>> 
>>> To search Galaxy mailing lists use the unified search at:
>>> http://galaxyproject.org/search/mailinglists/
>>> 
>> 
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
> 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to