Hi all,

I have just found a problem using the "Filter data on any column using
simple expressions" tool, i.e. files tools/stats/filters.xml and
tools/stats/filters.py

I have some six column tabular like this, where I have used \t for a
tab, and \n for the new lines:

#ID\tHMM_Sprob_score\tSP_len\tRXLR_start\tEER_start\tRXLR?\n
gi|301087619|ref|XP_002894699.1|\t0.990\t21\t54\t64\tY\n
gi|301087623|ref|XP_002894700.1|\t0.997\t23\t\t\tN\n
gi|301087628|ref|XP_002894701.1|\t0.000\t24\t\t\tN\n

Breakdown of my data:

Column 1 - ID, mandatory string
Column 2 - HMM_Sprob_score, mandatory float
Column 3 - SP_len, mandatory integer
Column 4 - RXLR_start, optional integer
Column 5 - EER_start, optional integer
Column 6 - RXLR?, mandatory string (Y or N)

Notice that in my output columns 4 and 5 can be empty or an integer.

I'm trying to filter this file using c6=='Y', i.e. column six is a
yes. This works (one row output) but Galaxy tells me:

Info: Filtering with c6=='Y',
kept 100.00% of 4 lines.
Skipped 3 invalid lines starting at line #1: "#ID HMM_Sprob_score
SP_len RXLR_start EER_start RXLR?"

Then if I try to filter using c6=='N', i.e. column six is a no, it
fails to work (zero rows of output instead of three) and tells me:

kept 0.00% of 4 lines.
Skipped 3 invalid lines starting at line #1: "#ID HMM_Sprob_score
SP_len RXLR_start EER_start RXLR?"

Digging into the code, tools/stats/filters.py gets given the list of
column types from Galaxy and (regardless of which columns are to be
used) attempts to cast them to integers, floats, etc.

It looks like Galaxy has decided that my columns 4 and 5 are integers
(based on the first row), and therefore filters.py blindly tries to
using int(...) on all these entries and that fails on the empty cells.

I see several issues,

(a) The filters.py tool only really needs to cast those columns being
used for the filter (fairly easy to fix)
(b) The galaxy column type detection seems a bit fragile (hard to
really fix without looking at all the data).
(c) Are there other tools that would break in a similar way to filter.py?

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to