Sure. I'll try to be concise; approach was sketched out about a month ago on
the board. I'll be uploading our generalized reporting tool which can be an
example of this once it has tests, but for now the bare bones:
Background: we wanted the ability to launch a Blast search of a number of fasta
sequences, and then have the results displayed in an HTML form, by query and
hits, and then allow a user to select hits for particular queries and have them
show up in their own datasets, each ready to have a phylogenetic tree
visualization pipeline of tools. The reason an HTML form was called for is
that one can then see for each hit various columns of information, that then
allow you to make a decision about whether you want that hit or not in the next
stage.
So first we have a dataset containing choice information, say this combo of
BLAST nucleotide sequence search and hit info. (search query row indicated by
"1" in query column):
Accession ID pident length sequence Query Row
Assembly_67_BCC1 - - AGGAC...TGCA 1 1
gi|158343637|gb|EU057648.1| 99.55 442 AGGAC...TGCA 0 2
gi|158343987|gb|EU057686.1| 99.10 442 AGGAC...TGCA 0 3
gi|158343677|gb|EU057652.1| 98.87 387 TGGAC...TGCA 0 4
...
Assembly_67_BCC8 - - ATGG...CCC 1 5
...
Tool A: "Selection Form": takes in above info, provides an HTML report in which
an HTML form provides the necessary input to Tool B.
Tool B: "Selection Tool": takes in same dataset as above, but generates output
file that includes only selected rows of data (and only desired columns). (The
nice thing about Tool B is that it can be set up to work directly on the above
dataset without needing to be fed by Tool A, its just that when called up
directly, it only offers a selection list as provided by its own XML form spec.)
Tool A:
Starting in tool XML, we indicate a) input type of data to select in history,
b) html output file where form is built, c) some useful ids related to the
input data file (don't confuse id with hid or dataset_id!).
"tool_input_dataset_file.id" is the one we need to pass to Tool B.
<tool id="bccdcBLASTreporting" name="BLAST Reporting" version="1.0.4">
...
<command interpreter="python">
my_python.py $tool_input_dataset_file $html_file
$tool_input_dataset_file.hid:$tool_input_dataset_file.dataset_id:$tool_input_dataset_file.id
-f "
...
</command>
...
<inputs>
<param name="tool_input_dataset_file" type="data" format="[e.g.
tabular, or whatever type in history]" label="My insightful results"/>
...
</inputs>
<outputs>
...
<data format="html" name="html_file" label="HTML report for
data $tool_input_dataset_file.hid" />
</outputs>
Tool A builds the html form. The only trick here is that you have to load the
Tool B form in galaxy, and view its frame's source code to see the right values
for tool_id and tool_state (an initial tool_state value seems to work fine). I
use a dictionary lookup to store these, and combine with string replacement in
a multi-line string for simple html templating. Below is code slightly adapted
for this writeup.
in_file, out_html_file, selection_file_data = args
sel_file_fields = selection_file_data.split(':')
self.lookup = {
'timestamp': time.strftime('%Y/%m/%d'),
'tool_id': 'bccdcSelectSubset',
'tool_state':'800.....................71002e',
'select_row':0,
'dataset_selection_id': sel_file_fields[2]
}
form_html = """
<div style="float:right" id="buttonPrint" class="nonprintable">
<button onclick="window.print()">Print</button>
</div>
<form id="tool_form" name="tool_form"
action="../../../tool_runner" target="galaxy_main" method="post"
enctype="application/x-www-form-urlencoded">
<input type="hidden" name="refresh" value="refresh"/>
<input type="hidden" name="tool_id"
value="%(tool_id)s"/>
<input type="hidden" name="tool_state"
value="%(tool_state)s">
<input type="hidden" name="input"
value="%(dataset_selection_id)s"/>
<input type="hidden" name="incl_excl" value="1"/>
<input type="submit" class="btn btn-primary
nonprintable" name="runtool_btn" value="Submit">
""" % self.lookup
with open(html_file, 'w') as fp_out:
fp_out.write(HTML_REPORT_HEADER_FILE)
fp_out.write(form_html)
...
And now write out all the table stuff for each row in input file with a
checkbox selector:
with open(in_file) as f_in:
for line in f_in:
rowdata = line.split('\t')
self.lookup['select_row'] +=1
tdTags = ''
for (col, field) in
enumerate(self.display_columns):
lookup['value'] = rowdata[col]
if (col == 0):
tdTags += '<td><input
type="checkbox" name="select" value="%(select_row)s" />%(value)s</td>' %
self.lookup
else:
tdTags += '<td>%(value)s</td>'
% self.lookup
fp_out.write("""\n\t\t\t<tr>%s</tr>""" % tdTags)
...
fp_out.write(HTML_REPORT_FOOTER_FILE)
Tool B:
To keep it simple this one just does a single output dataset but I can show a
multiple output datset one, one for each set of query hits selected above if
you want. ' force_history_refresh="True" ' is supposed to refresh the history
list after this executes all of its file writing but for some reason that
doesn't seem to work on my galaxy.
<tool id="bccdcSelectSubset" name="Select subsets"
force_history_refresh="True">
<command interpreter="python">
select_subset.py $input $output1 $output1.id $__new_file_path__
$incl_excl $select
</command>
<inputs>
<param name="input" type="data" format="tabular"
label="Numbered tabular input file"/>
<param name="incl_excl" type="select" format="text"
label="Include or exclude selection?">
<option value="1">Include selection</option>
<option value="0">Exclude selection</option>
</param>
<param name="select" type="select" multiple="true"
display="checkboxes" label="Select lines below">
<options from_dataset="input">
<column name="name" index="0"/>
<column name="value" index="-1"/>
</options>
</param>
</inputs>
<outputs>
<data name="output1" format="tabular" metadata_source="input"
label="$tool.name on data $input.hid"/>
</outputs>
<help>
.. class:: infomark
**What it does**
This tool produces a tabular file with a subset of the lines in its input
tabular file.
</help>
</tool>
And the python:
'''
python select_subset.py $input $output $incl_excl $select
'''
def stop_err( msg ):
sys.stderr.write("%s\n" % msg)
sys.exit(1)
import sys
try:
input, output, incl_excl, select = sys.argv[1:]
except:
stop_err('you must provide the arguments input, output, incl_excl and
select.')
lines = {}
try:
lines = dict([(int(num), '') for num in select.split(',')])
except:
stop_err('Did you remember to number the input dataset?')
include = bool(int(incl_excl))
if include:
print 'Including selected lines...'
else:
print 'Excluding selected lines...'
f_out = open(output, 'w')
with open(input) as f_in:
for line in f_in:
cols = line.split('\t')
try:
num = int(cols[-1])
except:
stop_err('Did you remember to number the input dataset?')
if include:
if num in lines:
f_out.write('\t'.join(cols[:-1])+'\n')
else:
if not num in lines:
f_out.write('\t'.join(cols[:-1])+'\n')
f_in.close()
f_out.close()
print 'Done.'
________________________________________
From: Igor Topcin [[email protected]]
Sent: Friday, May 09, 2014 1:05 PM
To: Dooley, Damion
Cc: [email protected]
Subject: Re: [galaxy-dev] Inform tool interface with data specific to selected
dataset
Hi Damion,
Would you mind sharing your approach with us all?
Thanks!
Igor
On May 9, 2014 1:51 PM, "Dooley, Damion"
<[email protected]<mailto:[email protected]>> wrote:
Hello, Eric,
If the dynamic filters approach doesn't work out I can send you an approach
that worked for me. It involves creating a tool-generated html report that
contains a form which provides selection choices; and the form is set to submit
to a 2nd tool of your choice tool (it contains the necessary fields to prime
the tool). Not sure if it works on every breed of galaxy out there though.
d.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/