May 20, 2011 Galaxy Development News Brief
NBIC Galaxy Hackathon Results
How to get this distribution
new: % hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist
upgrade: % hg pull -u -r 8c11dd28a3cf
Key Upcoming Galaxy Event: GCC 2011
May 24-26, 2011 Galaxy Community Conference
Lunteren, The Netherlands
= NBIC Galaxy Hackathon Results =
Two new features (so far) were created and added to Galaxy from the May
Hackaton: tool_conf.xml Autogeneration
Work from the NBIC Galaxy Hackathon by Rob Hooft, Henk van den Toorn,
and Wil Koetsier which adds new optional tags to tool configuration
files, and a script which uses these tags to automatically generate
The scripts can be found at:
Documentation on the new XML tags and how to use the scripts has not yet
Hackathon: Tool Tags
Work from the NBIC Galaxy Hackathon by Freek de Bruijn, Alex Bossers,
and Nate Coraor which enables associating tags with tools. This feature
requires 'enable_tool_tags = True' since there are some inefficient
database operations performed during tool loading to support this
Tool authors can specify tags directly in tool configuration files with
the new optional <tags> tagset:
If used with the tool_conf.xml autogeneration scripts, this will create
<tool> tags containing a new "tags" attribute:
<tool file="example/example.xml" tags="ngs,mapping"/>
Upon startup, Galaxy associates these tags with the tools and presents
them in a cloud at the top of the tool menu when unhidden via the tool
panel's "Options" menu.
= Picard =
New Galaxy tools wrapping the most commonly used **Picard** functions
related to metrics and repair of mapped short read sequencing.
* BAM index statistics (count of mapped reads by reference chromosome)
* Alignment summary
* Hybrid selection (for targeted data)
* Insert size (for paired reads)
* Library complexity
Repair tools include
* Fix mate pair, mark optical/pcr duplicates
* Add or replace read groups
* SAM/BAM: replace headers and/or reorder based on a different reference
= FastQC =
New tool wrapper generates a comprehensive and useful QC report.
Inputs and Outputs
* This wrapper will accept any FASTQ, SAM, or BAM file as primary input.
It will also take an optional file containing a list of contaminants
information, in the form of a tab-delimited file with 2 columns, name
* The tool produces a single HTML output file that contains all of the
results, including the following basic statistics:
** Per base sequence quality
** Per sequence quality scores
** Per base sequence content
** Per base GC content
** Per sequence GC content
** Per base N content
** Sequence Length Distribution
** Sequence Duplication Levels
** Overrepresented sequences
= Workflows & Multiple datasets =
Workflows can now be run on multiple datasets at the same time. The run
workflow page will now show a new stacked dataset icon.
Upon clicking that, the selection box changes to a multi-select, and an
independent workflow execution will occur for each of these input
dataset steps. The rest of the parameters of the workflow will be
identical. Combining this functionality with the existing "Send results
to a new history" option will send the results of *each* workflow
execution to a separate history, numbered sequentially "<name> 1",
"<name> 2", etc., where <name> is whatever text you put in the new
history name box.
Please note that this new type of "multiple-input dataset" step can
currently be used only once in any individual workflow.
Updated & Improved
= Current Tools =
* BAM to SAM tool can now optionally output headers.
* GFF,GFF3,GTF related
** Gracefully handle parsing errors in GFFReader and accurately compute
raw size of GFF features.
** Enable GFF and GFF3 attributes to be written in GTF format.
** Make Operate on Genomic Intervals (GOPS) intersect and subtract tools
compatible with GFF features rather than GFF lines.
** Enable GFF filter attributes tool to accept arbitrary conditions.
* Datasource tools: Remove hard-coded special-case handling of UCSC
Table Browser and GBrowse datasource tools; functionality remains, but
is now a part of the individual tool's XML configuration files.
Auto-detect is now available by providing data_type=auto parameters.
* Make Cufflinks, Cuffcompare, and Cuffdiff wrappers compatible with
v1.0.1 (new option implementation pending).
* BWA wrapper enhancement
** The Galaxy BWA wrappers (for Illumina and for SOLiD) were updated for
version 0.5.9 of BWA. Three new options have been added to them: Maximum
number of alignments to output in the XA tag for reads paired properly
(samse/sampe -n); Maximum number of alignments to output in the XA tag
for disconcordant read pairs (excluding singletons) (sampe -N); and
Specify the read group (samse/sampe -r).
** If read groups are to be specified, the following aspects MUST be set:
*** Read group identifier (ID)
*** Library name
*** Platform/technology used to produce the reads sample
** And the following can be set:
*** Sequencing center that produced the read
*** Date that run was produced
*** Flow order
*** Array of nucleotide bases that correspond to the key sequence of
*** Programs used for processing the read group
*** Predicted median insert size
*** Platform unit
** Formerly, when sampse/sampe -n was specified, it would cause BWA to
output a format other than SAM. This is no longer the case. The BWA
manual can be found at
* SAM header
** For several wrappers where SAM header suppression was optional (BWA,
BFAST, Bowtie, SRMA), the default was changed to NOT suppress, however
it is still optional.
** Bam-to-sam now keeps the header in the BAM file.
* Setting of output dbkey
** Outputs for the following now are correctly set to the relevant dbkey
(for reference dbkey whether using built-in or one from history):
Freebayes, SRMA, Mosaik, BFAST, Bowtie, BWA, sam-to-bam, and bam-to-sam.
= New Tools =
** note: These tool integrations should be considered alpha. Changes are
not necessarily backwards-compatible with workflows or re-run functionality.
** Realigner Target Creator
** Indel Realigner
** Count Covariates
** Table Recalibration
** Analyze Covariates
** Unified Genotyper
* Add tool Filter GTF by attribute values list. Tool filters a GTF based
on a list of attribute values. The tool is especially useful as a
downstream analysis tool for filtering GTF files based on Cuffdiff outputs.
= Trackster =
* Greatly improve LineTrack performance to fetch optimal amount of data
* Add support for Operate on Genomic Intervals (GOPS) intersect and
* Enable users not logged in to use tools in shared visualizations.
* Add support for static tool select parameters.
* Enable datasets that cannot be indexed to be used as tool inputs.
* Ensure that reads are drawn in squish and pack modes even when view
area is large by setting a minimum width of 1px.
* Add histogram mode to feature tracks so that user can generate
coverage histogram at any level of data.
* Use user preferences when drawing summary tree.
* Enable a tool to be run on complete dataset or a visible region.
= User Interface (UI) =
* Show rerun and info buttons in dataset previews for additional states
(e.g. running, queued).
* Show details button functional for all tools run within a history,
even if currently retired.
* Show inheritance chain for datasets expanded to note if source was
another history or a library.
= CloudMan =
* Cloud instance sharing: now share your entire cloud instance
deployment (including data, analyses, and/or customizations) with the
world or specific users with a click of a button.
= Source =
* Reserved/predefined tool template values
** Tool command line templates may make use of certain variables
pre-defined by the Galaxy framework. Some of these already existed but
were undocumented. All have been changed to use a common (pythonic)
naming scheme, but the old names are retained for backwards
*** new name = old name (if any) = value description
*** __new_file_path__ = universe_wsgi.ini new_file_path value
*** __tool_data_path__ = GALAXY_DATA_INDEX_DIR = universe_wsgi.ini
*** __root_dir__ = GALAXY_ROOT_DIR = Top-level Galaxy source directory
made absolute via os.path.abspath()
*** __datatypes_config__ = GALAXY_DATATYPES_CONF_FILE =
universe_wsgi.ini datatypes_config value
*** __user_id__ = userId = Email's numeric ID (id column of galaxy_user
table in the database)
*** __user_email__ = userEmail = User's email address
*** __app__ = The galaxy.app.UniverseApplication instance, gives access to
*** __app__.config and much more. Should be used as a last resort, may
go away in future releases.
= Tool Framework =
* When label text for a static option in a SelectToolParameter is not
provided, default to using the 'value'.
* Fix for dynamic options when referencing a DataToolParameter that has
already been wrapped.
* Only allow a user to rerun if they have access permissions on the dataset.
= Test Framework =
* Add a "contains" compare type to functional tests. Enables simple
checking for substrings in a test output file, on a line-by-line basis.
* Fix for expand grouping to allow toolbox tests to use the default
= Bug Fixes =
* Have MACS peak caller wrapper use return code to set error state.
Fixes issues seen when MACS was be green, despite encountering fatal errors.
* Fix hyperlinks in Cuffcompare and Cuffdiff documentation.
* Add support for comment handling to gff_to_interval_index tool.
* Workflow Parameter bugfix for the improperly handled case when a
parameter isn't used in any workflow step, but should still be available
* Bugfix for workflow run not reloading history.
* SGE/DRMAA runners did not respect the value set in
* Galaxy did not set a public username when 'use_remote_user = True' and
did not provide an interface to set it. Upon account creation, Galaxy
will now automatically create a public username matching the username
portion of the user's email address, with any non-alphanumeric
characters replaced with a '-'. If the username is not unique, a '1' is
appended, and then incremented until the username is unique. Users may
modify their public username via the User menu in the masthead.
= About Galaxy =
The Galaxy team is a part of http://www.bx.psu.edu/|BX at
http://www.psu.edu/|Penn State, and the
http://www.mathcs.emory.edu/|Mathematics and Computer Science
departments at http://www.emory.edu/home/index.html|Emory University.
Galaxy is supported in part by http://www.nsf.gov/|NSF,
http://www.genome.gov/|NHGRI, the http://www.huck.psu.edu/|Huck
Institutes of the Life Sciences, and http://www.ics.psu.edu/|The
Institute for CyberScience at Penn State, and
Join us at Twitter
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: