-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I did a bit of poking around in our code and internally BioJava represents all the default alphabet names (Protein, DNA, etc.) in upper case. It also allows for mixed case alphabet names.
It's not quite as easy as I thought to change these to lower case as they are often referenced by text name, meaning other people's code might break if I change them. Also, as it allows for mixed-case alphabet names, I can't do a toUpper/toLower fudge on persistence to BioSQL, as I wouldn't necessarily get out what I put in! So, I think I'll add this as a point on the recently announced BioJava 3 proposal, that BioSQL interaction must be compliant with standards laid down by the BioSQL project, and that our code will be able to cope with this internally. That brings us back to BioSQL standards - the idea of a mini-hackathon to solve this once and for all is a very good one. Our previous attempts between BioPerl and BioJava in Singapore were good, but still there are niggles as seen in this thread of discussion. It seems that a schema on it's own just isn't enough to make the various projects play nicely, and instructions are needed on exactly how to use that schema if they are truly all going to be able to use it without caring who or what wrote the data that is being read. cheers, Richard Hilmar Lapp wrote: > It seems BioPerl and Biopython both want (and have traditionally used) > lowercase - do you mind going with that for Biojava as well, or > alternatively, simply map upon insert/update and retrieve? > > -hilmar > > On Nov 8, 2007, at 11:18 AM, Richard Holland wrote: > > we do need a consensus here. > > I'm happy to go with whatever value is chosen, as the BioJava code can > easily be modified to suit. > > cheers, > Richard > > Hilmar Lapp wrote: >>>> Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we >>>> explicitly lowercase the value found for alphabet, and the comment >>>> says why: >>>> >>>> # Note: Biojava uses upper-case terms for alphabet, so we >>>> # need to change to all-lower in case the sequence was >>>> # manipulated by Biojava. >>>> $obj->alphabet(lc($rows->[3])) if $rows->[3]; >>>> >>>> However, when inserting sequences, we leave the value as is in >>>> BioPerl (which is lowercase), leading to a potential problem for >>>> Biojava upon retrieval. Do the Biojava folks deal with that? Should >>>> this may harmonized across the board? >>>> >>>> -hilmar >>>> >>>> On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote: >>>> >>>>> Dear Peter, >>>>> >>>>> All the alphabet are "DNA" (upper case) in my database. The >>>>> sequences are taken from NCBI by a BioJava application. >>>>> Thus is should be that BioJava inserts the records with "DNA". Thus >>>>> no potential "hidden bug" in BioPython. >>>>> >>>>> Maybe a point to share with the Open-Bio committee. >>>>> >>>>> Eric >>>>> >>>>> ----- Message d'origine ---- >>>>> De : Peter <[EMAIL PROTECTED]> >>>>> À : Eric Gibert <[EMAIL PROTECTED]> >>>>> Cc : [EMAIL PROTECTED] >>>>> Envoyé le : Jeudi, 8 Novembre 2007, 19h40mn 00s >>>>> Objet : Re: [BioPython] small "bug" correction in package BioSql >>>>> >>>>> Eric Gibert wrote: >>>>>> Dear all, >>>>>> >>>>>> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the >>>>>> function: >>>>>> >>>>>> ... >>>>>> >>>>>> please note my correction: force moltype to be turn in lower case as >>>>>> my database has upper case value! this raises the "Unknown moltype" >>>>>> error. >>>>> Hi Eric, I've made your suggested change in CVS, >>>>> biopython/BioSQL/BioSeq.py revision 1.13, thank you. >>>>> >>>>> I would encourage you to investigate why some of the "alphabet" fields >>>>> in the biosequence table are in upper case. There could be a bug >>>>> elsewhere which is writing these entries with the wrong alphabet. Is >>>>> this affecting all entries, or just some? >>>>> >>>>> Peter >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> _______ >>>>> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers >>>>> Yahoo! Mail >>>>> _______________________________________________ >>>>> BioPython mailing list - [EMAIL PROTECTED] >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHNFW84C5LeMEKA/QRApBiAJ41WqCDKOJhee5NxIsquYaR/ImBRgCfb7zM LX75HHvCUC/v4n3okmUQ+ME= =d6QO -----END PGP SIGNATURE----- _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
