Hi Len,
 
Glad to hear you are finding BioJava and BJIA useful. I will put up a tutorial on 
characters to Symbols shortly in the meantime have a look at the forSymbol() and 
dnaToken() methods of DNATools for convenience methods to tokenize DNA.
 
Biologists tend to use lower case for DNA and uppercase for Protein, BioJava is case 
insensitive (at least for DNA and RNA and, I think, protein). You could modify your 
AlphabetManager.xml and it would probably work (due to DNA tokenization being case 
insensitive) but I wouldn't reccomend it, strange things may happen, if not now then 
possibly later, especially if you try and play across a remote connection. The best 
thing to do might be to write your own tokenizer and use that when writing DNA. The 
only downside to that is that you won't be able to use some of the conveneince methods 
from the tools classes as they use the default tokenizers. You could always write your 
own convenience methods though, MySeqIOTools for example.
 
The BioSQL schema in its latest incarnation (BioSQL 1.0 or the Singapore schema) 
should be able to handle Taxonomy stuff. This schema is supported in biojava-live, the 
older schema is supported by biojava 1.30 and I don't know how well it handled Taxon 
data (not well I recall).
 
- Mark
 

        -----Original Message----- 
        From: Len Trigg [mailto:[EMAIL PROTECTED] 
        Sent: Tue 8/07/2003 9:17 a.m. 
        To: [EMAIL PROTECTED] 
        Cc: 
        Subject: [Biojava-l] Re: [Biojava-dev] Initial impressions...
        
        


        Matthew Pocock wrote: 
        > We need to make this process much easier. Unfortunately, getAsChar() 
        > doesn't realy work for us because we can have symbols for things that 
        > don't have a single char representation, such as codons. However, you 
        > shouldn't have to end up going through 20 function calls either. 
        > 
        > Is there a biojava in anger example of geting letters from symbols? 

        Nope, not that I could see. BTW, the BioJava in Anger is a very 
        helpful document, I've been consulting it often :-). Sounds like this 
        would make a good addition to the "how do I get between strings and 
        symbols" section. 

        On a related note, biojava seems to always use lowercase when writing 
        out DNA sequences. Is there an officially endorsed method for 
        switching to upper case? Should I modify my AlphabetManager.xml, or 
        should I reregister a new CharacterTokenization with the name "token" 
        so that it overrides the default one and gets picked by the various 
        output formats? 


        > > Parsing a BLAST output file was also easy, however, I had to use 
        > > "lazy" mode to work with our files (from NCBI BLAST 2.2.1), and I have 
        > > not yet figured out how to extract a) the length of the query 
        > > sequence, and b) the frame of the hits. Any suggestions here? 
        > 
        > Is that information in the annotation attached to the 
        > SeqSimilaritySearchSubHit or the SeqSimilritySearchResult? 

        When I print out all the annotations (basically using the BIA example 
        BlastParser.java, modified to include sub hit information), I see that 
        the queryFrame is present, but the query length information is not. 



        > Good luck with BioSQL and GFF. These are parts of the library that I use 
        > daily. Oh, and for the GFF, start off by using GFFTools. 

        I've written some sequences, annotated from GFF files to a mysql 
        database using BioSQL, and it worked great! Does the BioJava code 
        support writing taxonomy information to the database, so I can link my 
        sequences to species? 


        (I've moved this to biojava-l, since this seems more of a biojava-l 
        question than biojava-dev question, although with open source class 
        libraries, the line often seems to get blurred :-)) 

        Cheers, 
        Len. 
        _______________________________________________ 
        Biojava-l mailing list  -  [EMAIL PROTECTED] 
        http://biojava.org/mailman/listinfo/biojava-l 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to