Singh, Nimesh wrote: >Given any implementation of an alignment, there is no guarantee that the alignment >will have links to the original underlying sequences. >
I know this might be a bit finicky, and that you are most likely already aware of this. But, An alignment doesnt necesarily hold Sequences, so there might be no underlying sequences. An Alignment is built up from SymbolLists and is itself a SymbolList. Basicly, the thing i am no comfortable with is the addition of the SequenceIterator sequenceIterator() method to the Alignment Interface. If you take a look at som other interfaces, say SymbolList, Sequence etc you will see that they only contain a small sett of methods and properties. The methods and properties in the interface is what makes an object into an object of that particular type. It could be seen as the smallest common set of functionality needed for being a Sequences, a SymbolLists etc. The trick is to keep these "core" intefaces as simple and as generic as possible, so no one gets restricted by them, and can write their own implementations that then nicely interacts with the rest of biojava. The functionalily of the SequenceInterator sequenceIterator() method can be broken down into two parts as i see it. 1) Allow access to SymbolLists in the alignment via an Iterator object. 2) Construct SimpleSequence objects from SymbolLists. The first point is a nice addition imho. You can of course first get all the labels, and then iterate over those and call SymbolList getSymbolListForLabel( Object label ) but this is simpler, and in many situations you just want an Iterator anyway. This doesnt realy add anything to what an Alignment is, it just adds a method to let you access the properties of an alignment, in a slightly different way. The second part is the thing i dont like. To me, the core properties that defines what an Alignment is, doesnt contain the functionality of being able to make SimpleSequences out of SymbolLists. And should therefore not be included in the Alignment interface ( as that should only define the most basal basic things that all Alignments have in common ). Biojava contains many types of SymbolLists, that all can be used for alignments, things that i remember out of my head are SimpleSymbolList, GappedSymbolList, SimpleSequence, GappedSequence, SimpleAlignment and the list will grow more. So then when people that dont work with SimpleSequences but with Say GappedSeqeunces or SimpleAlignments need the same type of conversion functionality, you are in for a problem, either you add yet more to the "core" Alignment interface, forcing every alignment implementation to contain lots of conversion code between different types of SymbolLists. This is also nasty because you need to use a different API for different types of Symbollists ( you need to call a different Iterator function that gives you a GappedSequenceIterator, or a SimpleAlignmentIterator ) so it becomes much harder to switch from one type of SymbolList to another. Of course you can refuse to have any more SequenceIterator code in the "core" Alignment interface, then that type of code will have to go somwhere else, and in the end you have code basicly doing the same things, but spread over different locations in the source tree, making it harder to maintain and keep track off. To summarise the whole rant. 1) I do think its appropriate to add a method to the Alignment interface that allows you to get hold of an iterator, that you can sure to iterate over the SymbolLists in the alignment. 2) I dont think its appropriate to add a SequenceIterator to the "core" Alignment interface. As conversion/ construction functionality doesnt realy belong in the simple "core" interfaces. I wouldnt think that adding toSequence() in the SymbolList interface would be appropriate either, or adding toGappedSequence in the Sequence interfacce. I understand that you find SimpleSequence construction functionality nice, so what i would suggest would be somthing like. 1) change the sequenceIterator method into a symbolListIterator method 2) then just have your implementations of the AlignmentInterface implement the sequenceIterator method or, add that sequenceIterator method to some of the other, "more advanced" alignment interfaces, just dont put it in the "core" Alignment interface. Or you could make something like SimpleSequenceAlignment that extends Alignment and contains the extra functionality you want. > I think that the AlignmentSequenceIterator can only use public functions from the >Alignment interface. This ensures that it will work for any implementation. We can >replace the sequenceIterator implementation in individual Alignment implemenations to >return the underlying Sequences. I'll look into that. > > >The reason that the new method returns a SequenceIterator is because that is an >already existing commonly used iterator. There are other methods (usually >symbolListForLabel()) already in the interface if you use SymbolLists that don't >readily translate into Sequences. > Yes, but by adding the sequenceIterator() method to the basic Alignment interface, EVERYONE writing an Alignment class needs to implement this method even if the SymbolLists it contains doesnt readily translate into Sequences. So they will then either return some strange types of sequences, or they might just be forced to throw an Exception or just dont return a thing. And to me, that signals, that the method dont realy belong in the simple "core" Alignment interface. As it should only contain the absolute smallest set of methods that are needed to define the core capabilities of an Alignment, and all those "core"methods should be able to return valid results from all types of Alignments. > >So, I'll look at making more specific implementations for sequenceIterator that >return underlying sequences. > >Nimesh > > > > > > >Here is the cod for AlignmentSequenceIterator: > > > >public class AlignmentSequenceIterator implements SequenceIterator { > > private Alignment align; > > private Iterator labels; > > private SequenceFactory sf; > > public AlignmentSequenceIterator(Alignment align) { > > this.align = align; > > labels = align.getLabels().iterator(); > > sf = new SimpleSequenceFactory(); > > } > > public boolean hasNext() { > > return labels.hasNext(); > > } > > public Sequence nextSequence() throws NoSuchElementException, >BioException { > > if (!hasNext()) { > > throw new NoSuchElementException("No more sequences in the >alignment."); > > } > > else { > > try { > > Object label = labels.next(); > > SymbolList symList = align.symbolListForLabel(label); > > Sequence seq = sf.createSequence(symList, label.toString(), >label.toString(), null); > > return seq; > > } catch (Exception e) { > > throw new BioException(e, "Could not read sequence"); > > } > > } > > } > >} > >_______________________________________________ > >Biojava-l mailing list - [EMAIL PROTECTED] > >http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > > > > _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l