Hi Len, Glad to hear you are finding BioJava and BJIA useful. I will put up a tutorial on characters to Symbols shortly in the meantime have a look at the forSymbol() and dnaToken() methods of DNATools for convenience methods to tokenize DNA. Biologists tend to use lower case for DNA and uppercase for Protein, BioJava is case insensitive (at least for DNA and RNA and, I think, protein). You could modify your AlphabetManager.xml and it would probably work (due to DNA tokenization being case insensitive) but I wouldn't reccomend it, strange things may happen, if not now then possibly later, especially if you try and play across a remote connection. The best thing to do might be to write your own tokenizer and use that when writing DNA. The only downside to that is that you won't be able to use some of the conveneince methods from the tools classes as they use the default tokenizers. You could always write your own convenience methods though, MySeqIOTools for example. The BioSQL schema in its latest incarnation (BioSQL 1.0 or the Singapore schema) should be able to handle Taxonomy stuff. This schema is supported in biojava-live, the older schema is supported by biojava 1.30 and I don't know how well it handled Taxon data (not well I recall). - Mark
-----Original Message----- From: Len Trigg [mailto:[EMAIL PROTECTED] Sent: Tue 8/07/2003 9:17 a.m. To: [EMAIL PROTECTED] Cc: Subject: [Biojava-l] Re: [Biojava-dev] Initial impressions... Matthew Pocock wrote: > We need to make this process much easier. Unfortunately, getAsChar() > doesn't realy work for us because we can have symbols for things that > don't have a single char representation, such as codons. However, you > shouldn't have to end up going through 20 function calls either. > > Is there a biojava in anger example of geting letters from symbols? Nope, not that I could see. BTW, the BioJava in Anger is a very helpful document, I've been consulting it often :-). Sounds like this would make a good addition to the "how do I get between strings and symbols" section. On a related note, biojava seems to always use lowercase when writing out DNA sequences. Is there an officially endorsed method for switching to upper case? Should I modify my AlphabetManager.xml, or should I reregister a new CharacterTokenization with the name "token" so that it overrides the default one and gets picked by the various output formats? > > Parsing a BLAST output file was also easy, however, I had to use > > "lazy" mode to work with our files (from NCBI BLAST 2.2.1), and I have > > not yet figured out how to extract a) the length of the query > > sequence, and b) the frame of the hits. Any suggestions here? > > Is that information in the annotation attached to the > SeqSimilaritySearchSubHit or the SeqSimilritySearchResult? When I print out all the annotations (basically using the BIA example BlastParser.java, modified to include sub hit information), I see that the queryFrame is present, but the query length information is not. > Good luck with BioSQL and GFF. These are parts of the library that I use > daily. Oh, and for the GFF, start off by using GFFTools. I've written some sequences, annotated from GFF files to a mysql database using BioSQL, and it worked great! Does the BioJava code support writing taxonomy information to the database, so I can link my sequences to species? (I've moved this to biojava-l, since this seems more of a biojava-l question than biojava-dev question, although with open source class libraries, the line often seems to get blurred :-)) Cheers, Len. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l