-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Let's pretend for a second that I'm rather lazy (oh...wait), and I have
ZERO interest in writing datatype parsers to sniff and validate whether
or not a specific file is a specific datatype. I'm a sysadmin and
bioinformatician, and I've worked with dozens of libraries that exist to
parse file formats, and they all die in flames when I feed them bad data.

Would it be possible to somehow define requirements for datatypes?

I don't want to take on the burden of code I write saying "yes, I've
sniffed+validated this and it is absolutely a genbank file". That's a
lot of responsibility, especially if people have malformed genbank files
and their tools fail as a result.

I would like to do this with BioPython and turf the validation to
another library that exists to parse genbank files, that will raise and
exception if they're invalid.

> def sniff(self, filename):
>   from Bio import SeqIO
>   try:
>     self.records = list(SeqIO.parse( filename, "genbank" ))
>     return True
>   except:
>     self.records = None
>     return False
> 
> def validate(self, dataset):
>   from Bio import SeqIO
>    errors = list()
>   try:
>     self.records = list(SeqIO.parse( dataset.file_name, "genbank" ))
>   except Exception, e:
>     errors.append(e)
>   return errors
> 
> def set_meta(self, dataset, **kwd):
>   if self.records is not None:
>     dataset.metadata.number_of_sequences = len(self.records)

so much easier! And I can shift the burden of validation and sniffing to
upstream, rather than any failures being my fault and requiring
maintenance of a complex sniffer.

Cheers,
Eric

- -- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas A&M University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTyBmyAAoJEMqDXdrsMcpVQa0P/jj0edAKM6QsodhRWHglR92W
tej1tJjtPgtJ15wsFzq6wVfhbL5J39ytsWjjtk//jhVNXh4FEE/OFZe6Nx9uTFKP
ybazyTrLSCrxsST+w+Rx8Q9vfzShr87vjP+fC1k5i2EZOgogPOcQml1ouOHHjC6z
pArrwPOvL3ZxWJG7oEcZjUjrPD8+ffhfQ/x096YYIMw7Hg74d50ARwtawJRoslZD
JnYWa+aUOcsvC3QMrLKkDm4qBaTHa5x7x7P07Lcx7X65iMPDcuMZNtImiLztNscF
QwbbdJdcs8oeSRRnmKgAllRAKf4dMeiyaSI+muVzNlpvLlSMZBNawD0bO1OXmIQH
vAaV0eU+rYmDJSGo330o+RydvlDJENTXOkDt0TxmvfYAPtg2TlJCiWUdL7V1LqqF
n8J5Z7Cu/sqRGSr5ww6KY27QHq6TU1WZDsVZiyEWJeKg3HGzp0MUmzMdr7iSZawK
gnZxv6qg3+FlSqA30niyAuxEq588vS8uEFjjOfhnNLsUM7FAuFANF5z9bPOhG2qM
Xjc3/NY7NsERd9nsIwfRuz0DWni8upvZ39vfeRZ3OAW9NwjRzqXrQiQp08XHa934
z4EBnpcWc9rNSV/3APF/imecBTOoiKtZfzIfILLtOPGE407Bmd8cE8hWyW7ipvrT
QU6DIimj3eoMn+elXDfX
=M+s5
-----END PGP SIGNATURE-----
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to