-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/17/2014 02:11 PM, Peter Cock wrote:
> You could do something like that, and we already have
> Biopython packages in the ToolShed which can be listed
> as dependencies :)
> 

If my module depends on the biopython from the toolshed, will that be
accessible within a datatype? Would it be as simple as "from Bio import
X"? Most of what I've seen of dependencies (and please forgive my lack
of knowledge about them) consists of env.sh being sourced with paths to
binaries, prior to tool run.

> However, some things like GenBank are tricky - in order
> to tolerate NCBI dumps the Biopython parser will ignore
> any free text before the first LOCUS line. A confusing
> side effect is most text files are then treated as a
> GenBank file with zero records. But if it came back
> with some records it is probably OK :)

Interesting, very good to know.

> 
> Basically Biopython also does not care to offer file
> format detection simply because it is a can of worms.
> 
> Zen of Python - explicit is better than implicit.
> 
> We want you to tell us which format you want to try
> parsing it as.

Yes! Exactly! Which is why it's perfectly fine here:

SeqIO.parse( dataset.file_name, "genbank" )

All I want to know is whether or not this parses as a genbank file (and
has 1 or more records). BioPython may not do automatic format detection
(yuck, agreed), but since I already know I'm looking for a genbank file,
simply being able to parse it or not is "good enough".

> 
> Sorry,
> 
> Peter
> (Speaking as the Bio.SeqIO maintainer for Biopython)
> 
> 
> On Thu, Jul 17, 2014 at 7:45 PM, Eric Rasche <rasche.e...@yandex.ru> wrote:
> Let's pretend for a second that I'm rather lazy (oh...wait), and I have
> ZERO interest in writing datatype parsers to sniff and validate whether
> or not a specific file is a specific datatype. I'm a sysadmin and
> bioinformatician, and I've worked with dozens of libraries that exist to
> parse file formats, and they all die in flames when I feed them bad data.
> 
> Would it be possible to somehow define requirements for datatypes?
> 
> I don't want to take on the burden of code I write saying "yes, I've
> sniffed+validated this and it is absolutely a genbank file". That's a
> lot of responsibility, especially if people have malformed genbank files
> and their tools fail as a result.
> 
> I would like to do this with BioPython and turf the validation to
> another library that exists to parse genbank files, that will raise and
> exception if they're invalid.
> 
>>>> def sniff(self, filename):
>>>>   from Bio import SeqIO
>>>>   try:
>>>>     self.records = list(SeqIO.parse( filename, "genbank" ))
>>>>     return True
>>>>   except:
>>>>     self.records = None
>>>>     return False
>>>>
>>>> def validate(self, dataset):
>>>>   from Bio import SeqIO
>>>>    errors = list()
>>>>   try:
>>>>     self.records = list(SeqIO.parse( dataset.file_name, "genbank" ))
>>>>   except Exception, e:
>>>>     errors.append(e)
>>>>   return errors
>>>>
>>>> def set_meta(self, dataset, **kwd):
>>>>   if self.records is not None:
>>>>     dataset.metadata.number_of_sequences = len(self.records)
> 
> so much easier! And I can shift the burden of validation and sniffing to
> upstream, rather than any failures being my fault and requiring
> maintenance of a complex sniffer.
> 
> Cheers,
> Eric
> 
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/
- -- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas A&M University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTyCHwAAoJEMqDXdrsMcpVU6wP/26X1OOvvsF8kWvV7daA7ilh
7fpfh6uCKJ4aShyVbXmSvwvXP0i7lYvmoGWeNot46SZb/A9aZyd+05stMpn/Aqcm
q8SDpQop/sg8VUZBo6SerFpn8xQ1s3kT3hfFUHmAq25ity+bT58kPnpAmdQocuRg
V7F5CPGW3y1L4NMUHcBXockieGJgnnP4cEKWp++G/SUrExTYSBw2DmaYCC2Q0CIV
7XGbV3CoNTDXOsVZGvHQHXkYK6uL9yCN1R4xMc8UMkFN+bjlKbsU9aVgs6s2lImP
nazK6pD2z9EDz7VpVeDKYJiAa8cVpYQN/Ua3mNaMxa59gYh59AVQ1A5JMXBCpwQ5
Zm2o2roMbyeuWtB22pt5Dddim2qyYcie5A9t2hEJfBnMWOBCpPzEw34h2sm/5173
FC1etrltTMjdRsBl7SGE9WqAz5SRffgF3CE5JuFS9tqpCsSsuP2b0wIvY56Oixc9
VEF/tTNV05jG7O45QWoHr43CqqtiyXRZvqr7f8HaJkDjrtsNeMcWim6Wk4/fsNip
dw/jCCyMdanEGTn9oGqs8L1UfWmzLjut+UcOnFQM0R2f+xuD6gxW5PQYmjRFIf7i
cvFZ7XiGwd/6p5sI/3CYt7BnMMwaaIRZqnZd2NXK60R515OBx3nniG22rmUuviGh
uTNHT5Jt7m2mYZYMlCUk
=murl
-----END PGP SIGNATURE-----
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to