[galaxy-dev] Bug when reading in tab seperated files

Christian Brenninkmeijer Tue, 17 Nov 2015 02:57:08 -0800

Hi All,

I noticed there is a bug when you read in tab separated files and leave them as 
type auto.


These are then identified by
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/tabular.py
as "CSV" as the CSV type uses the python module "csv" which can read tab 
separated files.

Fine so far EXCEPT that CSV's set_meta method does not read columns correctly 
if tab separated.
def set_meta( self, dataset, **kwd ):
...
reader = csv.reader(csvfile)    #line 920

The default delimiter for ythins csv module is comma so a tab separated file  
file will have only 1 column.

As a result especially in Planemo parameters of type type="data_column" will 
not work as the systems thinks there is only one column in the data.

==

The CSV data type needs to be fixed or to protect backward compatibility 
replaced.

There are then several options for comma separated files.

1. Use python csv's sniff method to detect the delimiter in set_meta.
This will result in a slow down and effect backward compatability.

2. Make CSV handle only comma separated files.
Improve the def sniff( self, filename ): method (line 907) to make sure it is 
comma separated.
There are various clean ways of doing this.

3. Create a new True_CSV type that sniffs only comma separated files but leave 
the old one for backward compatibility.


For tab separated files

1. Above works here too

4 Then allow the default tabular to handle tab separated files.

5. Add a new type which extends True_CSV to sniff for tab separations and 
get_meta correctly with tabs.

===
I have code that works for True_CSV and the new TSV type if that is the best 
option.

Christian
University of Manchester

3b. Add one or more new types to handle tab separated files using pythons csv 
but informing python's csv reader of the new delimiter or dialect.


regards
Christian

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Bug when reading in tab seperated files

Reply via email to