Hi All,

We should look at easy ways to make finite sub-sets of the common
infinite alphabets play well e.g. give me the alphabet Integer[1..100],
ensuring that it implements FiniteAlphabet and therefore behaves in
cross-products efficiently.

I think for integers, it would be a fairly trivial addition (just one
public method on IntegerAlphabet and one private static class).

Matthew

Schreiber, Mark wrote:
> I like the API.
> 
> I am also intrigued by the idea of a QualitativeAlignment. I assume you would use it 
>for EST assemblies. Inspite of it being an assembly it may well be better represented 
>as an alignment. Therefore, if it is an Alignment the QualitativeAlignment could be a 
>sub-interface of UnequalLenthAlignment. There is also the question of what should be 
>aligned. For example the PhredSequence holds two symbol lists so do you align the 
>quality symbol list or the sequence or both?
> 
> The problem is caused by the fact that the quality information is represented as an 
>Integer alphabet which is infinite and a DNA alphabet which is Finite. The equation 
>to calculate the phred score is QV = - 10 * log_10( P_e ) where P_e is the 
>probability that the base call is an error. Hence the lower bound is 0 where P_e is 1 
>while the upper bound is infinite. However realistically a sequencer could never 
>approach P_e of > 0.00001 which is a phred score of 50 (a very conservative 
>estimate). Thus a fininte alphabet could be made and a cross product alphabet used 
>instead? Can anyone see a reason why this might be a bad thing?
> 
> Do people have views on whether a EST contig assembly is best represented as an 
>Alignment or an Assembly?
> 
> Mark
> 
> 
> 
> 
>>-----Original Message-----
>>From: David Waring [mailto:[EMAIL PROTECTED]] 
>>Sent: Saturday, 27 April 2002 8:43 a.m.
>>To: biojava
>>Subject: RE: [Biojava-l] Functions Requirement...
>>
>>
>>Funny that this comes up now. I am currently working on some 
>>new Alignment classes. I will be supporting alignments of 
>>unequal length. I think this might be at time to discuss 
>>additions to the API.
>>
>> In addition to the functions Mathew mentioned include 
>>support for UnequalLengthAlignments as I am working on. I see 
>>at least 3 new methods
>>
>>        /**
>>        * The location of an individual SymbolList relative 
>>to overall Alignment
>>        */
>>    public Location locInAlignment(Object label);
>>
>>        /**
>>        * Returns a list labels, of all seqs that cover that column
>>        */
>>    public List labelsAt(int column);
>>
>>        /**
>>        * Returns list of all the labels that intersect that range
>>        */
>>    public List labelsInRange(Location loc);
>>
>>Another is support for QualitativeSymbolLists. That would have
>>
>>        /**
>>        * Returns a quality score for label/position
>>        */
>>        public List qualityAt(Object label,int column);
>>
>>I think that the unequal length methods should be added to 
>>the Alignment interface, they would be simple to implement in 
>>SimpleAlignment. One question; what should be the behavior of 
>>symbolAt() when the column is in range of the total alignment 
>>but not within the individual sequence? I suggest it should 
>>return null rather than throwing an error. Another possibilty 
>>would be to have a new Symbol (NullSymbol, or SpaceSymbol ) 
>>similar to GappedSymbol. I think this woud be better than 
>>having to always try to check that it is in range before 
>>calling symbolAt().
>>
>>Perhaps we could add new interfaces.
>>
>>QualitativeAlignment
>>
>>SequenceAlignment
>>several posibilites including making it implement 
>>FeatureHolder,  and or allow individual sequences to be 
>>Sequences perhaps with a method featuresAt(Object label, 
>>Location range);
>>
>>EditableAlignment
>>      remove (Object label)
>>      add (Object label,SymbolList seq, Location 
>>referenceLocation) -- and perhaps other sigs
>>      addGap (List labels, Location range, int length)
>>      removeGap (List labels, Location range, int length)
>>      shiftBase (List labels, Location range, int length)
>>
>>Any other suggestions?
>>
>>    David
>>
>>Bug note: There is currently a problem with SimpleAlignment. 
>>seqString() does not work, perhaps due to changes a few 
>>months ago with tokenization
>>
>>Exception in thread "main" java.util.NoSuchElementException: 
>>There is no tokenization 'token' defined in alphabet (DNA x DNA)
>>        at 
>>org.biojava.bio.symbol.AbstractAlphabet.getTokenization(Abstra
> 
> ctAlphabet.jav
> 
>>a:96)
>>        at
>>org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSy
> 
> mbolList.java:
> 
>>80)
>>        at SimpleAlignmentTest.main(SimpleAlignmentTest.java:33)
>>
>>
>>Does Alignment need to use a CrossProduct alphabet?
>>
>>
>>
>>
>>
>>>-----Original Message-----
>>>From: [EMAIL PROTECTED] 
>>>[mailto:[EMAIL PROTECTED]]On Behalf Of Matthew Pocock
>>>Sent: Friday, April 26, 2002 8:18 AM
>>>To: 阿俗
>>>Cc: [EMAIL PROTECTED]
>>>Subject: Re: [Biojava-l] Functions Requirement...
>>>
>>>
>>>阿俗 wrote:
>>>
>>>>Dear Sir,
>>>>
>>>>     How to implement "Multiple Sequence Alignment" or 
>>>
>>"Phylogenetic 
>>
>>>>tree" in BioJava?
>>>>     I cannot find any related function in online documents....
>>>>
>>>>
>>>>
>>>>                                                          
>>>
>>      Jim
>>
>>>Hi Jim,
>>>
>>>There is no direct support for phylogenetic trees currently in 
>>>BioJava. It would be a great thing to see added. We do have some 
>>>support for alignments, via the org.biojava.bio.symbol.Alignment 
>>>class. However, there are no well developed utilities or 
>>
>>support code 
>>
>>>for making alignments realy easy to work with. In particular, 
>>>Alignment needs modifying to allow easy addition/removal of 
>>
>>sequences 
>>
>>>from the alignment, and we need to add an easy to use 
>>>AlignmentSequence class so that you can annotate columns of an 
>>>aligment as features.
>>>
>>>You can insert gaps into a view of an underlying ungapped 
>>>sequence/symbol list using the GappedSymbolList and GappedSequence 
>>>classes. You can then build an alignment object from these gapped 
>>>views to get gapped alignemnts.
>>>
>>>The org.biojava.bio.dp package is a starting point for developing 
>>>alignment algorithms. So far it only has alignments of one and two 
>>>sequences to a model implemented, but the APIs do support 
>>
>>symultaneous 
>>
>>>alignment of arbitrarily many sequences to a model.
>>>
>>>This is an area that needs work and documentation. Does 
>>
>>anybody else 
>>
>>>on the list make alignments as part of their daily work?
>>>
>>>Matthew
>>>
>>>_______________________________________________
>>>Biojava-l mailing list  -  [EMAIL PROTECTED] 
>>>http://biojava.org/mailman/listinfo/biojava-l
>>
>>_______________________________________________
>>Biojava-l mailing list  -  [EMAIL PROTECTED] 
>>http://biojava.org/mailman/listinfo/biojava-l
>>
> 
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> _______________________________________________
> Biojava-l mailing list  -  [EMAIL PROTECTED]
> http://biojava.org/mailman/listinfo/biojava-l
> 



_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to