Hi All, We should look at easy ways to make finite sub-sets of the common infinite alphabets play well e.g. give me the alphabet Integer[1..100], ensuring that it implements FiniteAlphabet and therefore behaves in cross-products efficiently.
I think for integers, it would be a fairly trivial addition (just one public method on IntegerAlphabet and one private static class). Matthew Schreiber, Mark wrote: > I like the API. > > I am also intrigued by the idea of a QualitativeAlignment. I assume you would use it >for EST assemblies. Inspite of it being an assembly it may well be better represented >as an alignment. Therefore, if it is an Alignment the QualitativeAlignment could be a >sub-interface of UnequalLenthAlignment. There is also the question of what should be >aligned. For example the PhredSequence holds two symbol lists so do you align the >quality symbol list or the sequence or both? > > The problem is caused by the fact that the quality information is represented as an >Integer alphabet which is infinite and a DNA alphabet which is Finite. The equation >to calculate the phred score is QV = - 10 * log_10( P_e ) where P_e is the >probability that the base call is an error. Hence the lower bound is 0 where P_e is 1 >while the upper bound is infinite. However realistically a sequencer could never >approach P_e of > 0.00001 which is a phred score of 50 (a very conservative >estimate). Thus a fininte alphabet could be made and a cross product alphabet used >instead? Can anyone see a reason why this might be a bad thing? > > Do people have views on whether a EST contig assembly is best represented as an >Alignment or an Assembly? > > Mark > > > > >>-----Original Message----- >>From: David Waring [mailto:[EMAIL PROTECTED]] >>Sent: Saturday, 27 April 2002 8:43 a.m. >>To: biojava >>Subject: RE: [Biojava-l] Functions Requirement... >> >> >>Funny that this comes up now. I am currently working on some >>new Alignment classes. I will be supporting alignments of >>unequal length. I think this might be at time to discuss >>additions to the API. >> >> In addition to the functions Mathew mentioned include >>support for UnequalLengthAlignments as I am working on. I see >>at least 3 new methods >> >> /** >> * The location of an individual SymbolList relative >>to overall Alignment >> */ >> public Location locInAlignment(Object label); >> >> /** >> * Returns a list labels, of all seqs that cover that column >> */ >> public List labelsAt(int column); >> >> /** >> * Returns list of all the labels that intersect that range >> */ >> public List labelsInRange(Location loc); >> >>Another is support for QualitativeSymbolLists. That would have >> >> /** >> * Returns a quality score for label/position >> */ >> public List qualityAt(Object label,int column); >> >>I think that the unequal length methods should be added to >>the Alignment interface, they would be simple to implement in >>SimpleAlignment. One question; what should be the behavior of >>symbolAt() when the column is in range of the total alignment >>but not within the individual sequence? I suggest it should >>return null rather than throwing an error. Another possibilty >>would be to have a new Symbol (NullSymbol, or SpaceSymbol ) >>similar to GappedSymbol. I think this woud be better than >>having to always try to check that it is in range before >>calling symbolAt(). >> >>Perhaps we could add new interfaces. >> >>QualitativeAlignment >> >>SequenceAlignment >>several posibilites including making it implement >>FeatureHolder, and or allow individual sequences to be >>Sequences perhaps with a method featuresAt(Object label, >>Location range); >> >>EditableAlignment >> remove (Object label) >> add (Object label,SymbolList seq, Location >>referenceLocation) -- and perhaps other sigs >> addGap (List labels, Location range, int length) >> removeGap (List labels, Location range, int length) >> shiftBase (List labels, Location range, int length) >> >>Any other suggestions? >> >> David >> >>Bug note: There is currently a problem with SimpleAlignment. >>seqString() does not work, perhaps due to changes a few >>months ago with tokenization >> >>Exception in thread "main" java.util.NoSuchElementException: >>There is no tokenization 'token' defined in alphabet (DNA x DNA) >> at >>org.biojava.bio.symbol.AbstractAlphabet.getTokenization(Abstra > > ctAlphabet.jav > >>a:96) >> at >>org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSy > > mbolList.java: > >>80) >> at SimpleAlignmentTest.main(SimpleAlignmentTest.java:33) >> >> >>Does Alignment need to use a CrossProduct alphabet? >> >> >> >> >> >>>-----Original Message----- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTECTED]]On Behalf Of Matthew Pocock >>>Sent: Friday, April 26, 2002 8:18 AM >>>To: 阿俗 >>>Cc: [EMAIL PROTECTED] >>>Subject: Re: [Biojava-l] Functions Requirement... >>> >>> >>>阿俗 wrote: >>> >>>>Dear Sir, >>>> >>>> How to implement "Multiple Sequence Alignment" or >>> >>"Phylogenetic >> >>>>tree" in BioJava? >>>> I cannot find any related function in online documents.... >>>> >>>> >>>> >>>> >>> >> Jim >> >>>Hi Jim, >>> >>>There is no direct support for phylogenetic trees currently in >>>BioJava. It would be a great thing to see added. We do have some >>>support for alignments, via the org.biojava.bio.symbol.Alignment >>>class. However, there are no well developed utilities or >> >>support code >> >>>for making alignments realy easy to work with. In particular, >>>Alignment needs modifying to allow easy addition/removal of >> >>sequences >> >>>from the alignment, and we need to add an easy to use >>>AlignmentSequence class so that you can annotate columns of an >>>aligment as features. >>> >>>You can insert gaps into a view of an underlying ungapped >>>sequence/symbol list using the GappedSymbolList and GappedSequence >>>classes. You can then build an alignment object from these gapped >>>views to get gapped alignemnts. >>> >>>The org.biojava.bio.dp package is a starting point for developing >>>alignment algorithms. So far it only has alignments of one and two >>>sequences to a model implemented, but the APIs do support >> >>symultaneous >> >>>alignment of arbitrarily many sequences to a model. >>> >>>This is an area that needs work and documentation. Does >> >>anybody else >> >>>on the list make alignments as part of their daily work? >>> >>>Matthew >>> >>>_______________________________________________ >>>Biojava-l mailing list - [EMAIL PROTECTED] >>>http://biojava.org/mailman/listinfo/biojava-l >> >>_______________________________________________ >>Biojava-l mailing list - [EMAIL PROTECTED] >>http://biojava.org/mailman/listinfo/biojava-l >> > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
