Too bad there aren't any really good options. I will use the environment
variable approach for the query size limit. For the gene bank links I guess
modifying the .loc file is the least bad way. Maybe it can be merged into
galaxy_blast, that would at least solve the interoperability problems.

@Peter: One potential problem in merging my blast2html tool could be that I
have written it in python3, and the current tool wrapper therefore installs
python3 and a host of its dependencies, making for a quite large download.

Jan


On 16 June 2014 09:08, Peter Cock <p.j.a.c...@googlemail.com> wrote:

> On Mon, Jun 16, 2014 at 4:18 AM, John Chilton <jmchil...@gmail.com> wrote:
> > Hello Jan,
> >
> > Thanks for the clarification. Not quite what I was expecting so I am
> > glad I asked - I don't have great answers for either case so hopefully
> > other people will have some ideas.
> >
> > For the first use case - I would just specify some default input to
> > supply to the input wrapper - lets call this N - add a parameter to
> > the tool wrapper "--limit-size=N" - test that and then allow it to be
> > overridden via an environment variable - so in your command block use
> > "--limit-size=\${BLAST_QUERY_LIMIT:N}". This will use N is not limit
> > is set, but deployers can set limits. There are a number of ways to
> > set such variables - DRM specific environment files, login rc files,
> > etc.... Just this last release I added the ability to define
> > environment variables right in job_conf.xml
> > (
> https://bitbucket.org/galaxy/galaxy-central/pull-request/378/allow-specification-of-environment/diff
> ).
> > I thought the tool shed might have a way to collect such definitions
> > as well and insert them into package files - but Google failed to find
> > this for me.
>
> Hmm. Jan emailed me off list earlier about this. We could insert
> a pre-BLAST script to check the size of the query FASTA file,
> and abort if it is too large (e.g. number of queries, total sequence
> length, perhaps scaled according to the database size if we want
> to get clever?).
>
> I was hoping there was a more general mechanism in Galaxy -
> after all, BLAST is by no means the only computationally
> expensive tool ;)
>
> We have had query files of 20,000 and more genes against NR
> (both BLASTP and BLASTX), but our Galaxy has task-splitting
> enabled so this becomes 20 (or more) individual cluster jobs
> of 1000 queries each. This works fine apart from the occasional
> glitch with the network drive when the data is merged afterwards.
> (We know this failed once shortly after the underlying storage
> had been expanded, and would have been under heavy load
> rebalancing the data across the new disks.)
>
> > Not sure about how to proceed with the second use case - extending the
> > .loc file should work locally - I am not sure it is feasible within
> > the context of the existing tool shed tools, data manager, etc.... You
> > could certainly duplicate this stuff with your modifications - this
> > how down sides in terms of interoperability though.
>
> Currently the BLAST wrappers use the *.loc files directly, but
> this is likely to switch to the newer "Data Manager" approach.
> That may or may not complicate local modifications like adding
> extra columns...
>
> > Sorry I don't have great answers for either question,
> > -John
>
> Thanks John,
>
> Peter
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to