Well maybe not a bug but a potential danger. SimpleDistribution.Trainer will accept counts from ambiguity symbols (N) and the Gap symbol however when it comes to train it uses the AlphabetIndexer for DNA which does not include indices for these symbols and this leads to some very odd results.
A number of solutuions might exist: 1) prevent the addition of ambiguos symbols to SimpleDistribution.Trainer. Safe, but an unexpected N in your sequence could cause an unexpected exception so not very user freindly. 2) refactor SimpleDistribution.Trainer to add equal numbers of counts to ambiguity subsymbols. ie if N is added then add 1 count to each of a,c,g,t. However this will not work for gap symbols. 3) extend the dna AlphabetIndex to include IUPAC ambiguities (m,r,w,s,y,k,v,h,d,b,n) and the gap symbol. Solves the gap problem but maybe N should be added as one count to each of its subsymbols. Not sure I like any of these, any other suggestions?? Mark Schreiber Bioinformatics AgResearch Invermay PO Box 50034 Mosgiel New Zealand PH: +64 3 489 9175 ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l