Hi all,
Do you have a working tool definition file for QIIME's
beta_diversity_through_plots.py script? We're investigating whether
replacing the Fast UniFrac website with this would be feasible, and
I'd like to see if you have something together before I try to write
one myself.


On Sun, Jan 29, 2012 at 10:01 AM, Rob Knight <rob.kni...@colorado.edu> wrote:
> This is great news -- thanks for letting us know, and for your hard work on
> this!
> Rob
> On Jan 29, 2012, at 9:46 AM, Jim Johnson wrote:
> Pat,
> That sounds great.   Do one of you want to take ownership of the toolshed
> repository?
> At minimum, we should add developers to the list that can push changes.
> Thanks,
> JJ
> On 1/28/12 9:37 AM, Gillevet Patrick wrote:
> Jim et al
> Amanda has most of the scripts working now and will be putting them up on
> the toolshed.
> She will be in touch as soon as the scripts are validated a couple of times
> with different datasets.
> cheers...
> Pat
> On Dec 29, 2011, at 3:02 PM, Jim Johnson wrote:
> It is easiest to generate tools for galaxy when the applications or scripts
> can take arbitrarily named input files and generate output to given path
> names.
> Input directories, output directories are very convenient on the command
> line, but more of a challenge when crafting a galaxy tool.
> That said, many applications require a wrapper script to work with in
> galaxy.
> Thank you for the consistent script_info[] help/usage syntax in the qiime
> scripts,  which enabled me to generate a skeleton galaxy tool_config file
> for each qiime script.
> I had some time last spring to work on integrating qiime into galaxy.
> Unfortunately, I haven't had any time since to work on this.
> I put those partial results  on the Galaxy Tool Shed:
> http://toolshed.g2.bx.psu.edu/
> There's a continuing effort at George Mason University to incorporate qiime
> into galaxy tools, so you may want to ask them what they need.
> I started by generating galaxy tool_config files, e.g. align_seqs.xml,  by
> using python to get the script_info[] from the qiime script:
> $ cat generate_tool_config.bash
> #!/usr/bin/env bash
> python $1 > ${1%.*}.help
> cat tool_template.txt | sed "s/__TOOL_BINARY__/${1}/" | python -i $1 -h >
> ${1%.*}.log
> (I'll attach tool_template.txt )
> This generated skeleton tool_config .xml files that I could then edit as
> needed.
> ( http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax )
> I originally was calling all qiime scripts from a tool wrapper:
> qiime_wrapper.py
> But, if a script can be called with any input filepaths and write its
> results to any filepaths, and only writes to STDERR when it fails, then you
> could call that script directly.
> When should you use a tool_wrapper or call the qiime script directly?
>   Many of the qiime scripts could probably be called directly, especially if
> it can be called with arbitary input/output file pathnames.
>   The reasons for using a tool wrapper may be if input/output needs to be
> manipulated, moved, renamed in order to be used by the qiime script.
>   You'll also need a tool wrapper if the names or number of the output files
> can not be determined from the parameter settings.
>   ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files )
>   If your tool relies on a file ext to determine a format, you'll have to
> rename the input.
>   ( Galaxy dataset pathnames will look something like:
> /<your_galaxy_file_path>/072/dataset_72931.dat )
>   The format/type of a dataset is stored in its metadata, so the tool_config
> can use that information, especially if a script can take muliple
> alternative input formats.
>   A tool_wrapper can also be used to manage the stdout or stderr from a
> tool.   Galaxy currently interprets any output on stderr as a failure.
> A couple changes in galaxy should make somethings easier than when I first
> attempted this:
>   - galaxy now accepts dataset requests with sub directories. (
> https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in-extra_files_path-patch
> )
>     That means that output HTML files with links into sub directories can be
> left intact, with the html copied to the output dataset and the linked files
> to its "extra_files_path".
>   - if you know the pathname of an output relative to the working directory,
> galaxy can copy it automatically to the output dataset using the
> from_work_dir attribute.
>     ( see example in:
> https://bitbucket.org/galaxy/galaxy-central/src/21b645303c02/tools/ngs_rna/tophat_wrapper.xml
> )
> Datatypes
>   You may want to create new datatypes to make it easier for the user to
> correctly select inputs to a tool from previous outputs.
>   For example, the qiime mapping file is a tabular file with specific
> requirements.  I put a 'qiimemapping' datatype in
> lib/galaxy/datatypes/metagenomics.py and datatypes_conf.xml
>   so an input could generate a select list containing only qiimemapping
> datasets rather than all tabular ones.
> Generating a configfile
>   You can generate configfiles in the galaxy tool_config .xml file.   The
> configfile is generated by the Cheetah interpreter just as the commandline
> is.
>   see:  alpha_rarefaction.xml
> The qiime_wrapper.py was patterned after the mothur_wrapper.py   with some
> of the same wrapper params to handle run time determined output (perhaps not
> needed):
>   --galaxy_datasets
>          a comma separated list of regex:output_dataset the wrapper searches
> the working_dir and copies the file that matches the regex to the outout
> dataset
>          if the exact pathname is known, use the "from_work_dir" attribute
> instead
>   --galaxy_datasetid
>          would be an output dataset id that would be used to dynamically
> create additional new datasets at job termination
>          ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files
> "Number of Output datasets cannot be determined until tool run")
>   --galaxy_new_datasets
>          a comma separated list of regex:datatype used to dynamically create
> additional new datasets at job termination
>   --galaxy_new_files_path
>          the galaxy dir for dynamically generated output datasets
> *****************************************************************************************
>                                 Patrick M. Gillevet, Ph.D.
>                        Director, Microbiome Analysis Center
>     Professor, Department of Environmental Science and Policy
>                Affiliate Professor, School of Systems Biology
>              George Mason University, Prince William Campus
>                     10900 University Boulevard, MSN 4D4
>                              Manassas, Virginia  20110
> Office 703-993-1057     Room Occoquan-426     FAX 703-993-8430
>                                       http://mbac.gmu.edu
> ******************************************************************************************

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to