----- Original Message ----- From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, 27 July, 2002 7:00 G.Sake Subject: Biojava-l digest, Vol 1 #715 - 3 msgs
> Send Biojava-l mailing list submissions to > [EMAIL PROTECTED] > > To subscribe or unsubscribe via the World Wide Web, visit > http://biojava.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > [EMAIL PROTECTED] > > You can reach the person managing the list at > [EMAIL PROTECTED] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. SeqIOTools.readXXXXFields() method?? (Roy Park) > 2. Re: adding toSequenceIterator method for Alignment (=?UTF-8?B?S2FsbGUgTsOkc2x1bmQ=?=) > 3. Re: SeqIOTools.readXXXXFields() method?? (Mark Fortner) > > --__--__-- > > Message: 1 > From: Roy Park <[EMAIL PROTECTED]> > To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> > Date: Fri, 26 Jul 2002 11:16:06 -0500 > Subject: [Biojava-l] SeqIOTools.readXXXXFields() method?? > > Hello everyone. > > I deal with a number of pseudo EMBL/GenBank formatted sequences, and it > would be extremely nice (?) to have methods that only attempt to parse out > specified fields. > > The primary reason for this is that, right now, the format.readSequence() > throws BioException way too frequently for my purpose - i.e. although I only > need the fields X, Y and Z from each sequence definition, the readSequence() > throws exception where it finds the field W to be mal-formed, etc. > > I see that modified versions of the StreamReader class, the SequenceFormat > implementing classes, etc. has to be written, which I can do. I'm wondering > if anyone could suggest a preferred way of passing the desired fields to be > read. > > readXXXXFields(BufferedReader _br, ArrayList(of String) _fieldsToBeParsed).. > or > readXXXXFields(BufferedReader _br, String[] _fieldsToBeParsed)..etc. > > (I think the readXXXXX(BufferedReader) should be called if the second > argument is null.) > > Any input would be greatly appreciated. (what about the naming of the > methods - readXXXXPartial()??) > > Roy K. Park > Bioinformatics Data Analyst > Lexicon Genetics Incorporated > > > > *************************************************************************** > The contents of this communication are intended only for the addressee and > may contain confidential and/or privileged material. If you are not the > intended recipient, please do not read, copy, use or disclose this > communication and notify the sender. Opinions, conclusions and other > information in this communication that do not relate to the official > business of my company shall be understood as neither given nor endorsed by > it. > *************************************************************************** > > > > --__--__-- > > Message: 2 > Date: Fri, 26 Jul 2002 20:52:56 +0200 > From: =?UTF-8?B?S2FsbGUgTsOkc2x1bmQ=?= <[EMAIL PROTECTED]> > To: "Singh, Nimesh" <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED] > Subject: Re: [Biojava-l] adding toSequenceIterator method for Alignment > > Singh, Nimesh wrote: > > > I've created a class called AlignmentSequenceIterator that I intend to put in the org.biojava.bio.seq package. It will do the real work. I've also added > > public SequenceIterator sequenceIterator() { > > return new AlignmentSequenceIterator(this); > > } > >to each alignment class. It should work fine in every alignment, because AlignmentSequenceIterator uses the getLabels and symbolListForLabel methods from the Alignment interface. > > > > If this is fine, then I'll upload everything later today. If you have any suggestions for changes, then let me know. > > > >Nimesh > > > > > > Well, there is one big problem with this piece of code, you treat all > objects in the alignment as being > SymbolLists only, witch in reality isnt true, as you can insert any > object that implements the > SymbolList interface into an alignment. > > For example, i am currently populating my alignment objects with custom > Sequence objects. if i called > this code it would create new Sequence objects of the type > SimpleSequence, and as i understand it from > a quick look at the SequenceFactory code, it wont have any of the > annotations, features etc that the > original Sequence objects i added to the alignment had. so, instead of > geting my custom Sequence objects > back containing feature etc, i would get nearly "empty" SimpleSequence > objects back, witch makes it unusable. > Other problems should pop upp if you insert other objects into > Alignments, say other alignments. and instead > of getting them back as alignments when you iterate over the SymbolLists > in the alignemnt, you get it back > as a SimpleSequence. > > But, i do agree that adding a method to the Alignment interface, that > gives you an iterator so you can > iterate over the SymbolList's in the alignment is a good thing to add. > > My suggestion would just be to have it iterate over the SymbolLists that > are inserted into the Alignment > and avoid doing any type of alterations of the objects. That way you get > back what you insert, and > the method will work for everyone, just not people using SimpleSequences. > > regads Kalle > > > > >Here is the cod for AlignmentSequenceIterator: > > > >public class AlignmentSequenceIterator implements SequenceIterator { > > private Alignment align; > > private Iterator labels; > > private SequenceFactory sf; > > public AlignmentSequenceIterator(Alignment align) { > > this.align = align; > > labels = align.getLabels().iterator(); > > sf = new SimpleSequenceFactory(); > > } > > public boolean hasNext() { > > return labels.hasNext(); > > } > > public Sequence nextSequence() throws NoSuchElementException, BioException { > > if (!hasNext()) { > > throw new NoSuchElementException("No more sequences in the alignment."); > > } > > else { > > try { > > Object label = labels.next(); > > SymbolList symList = align.symbolListForLabel(label); > > Sequence seq = sf.createSequence(symList, label.toString(), label.toString(), null); > > return seq; > > } catch (Exception e) { > > throw new BioException(e, "Could not read sequence"); > > } > > } > > } > >} > >_______________________________________________ > >Biojava-l mailing list - [EMAIL PROTECTED] > >http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > --__--__-- > > Message: 3 > Date: Fri, 26 Jul 2002 22:26:30 -0500 > From: Mark Fortner <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Subject: Re: [Biojava-l] SeqIOTools.readXXXXFields() method?? > > I wonder if it would be worthwhile to have an alphabet like approach to > this, where the alphabets are actually field tokens/field names that are > either statically defined, or are defined in XML files? For example, > you might have entries like > <field-list name="swissprot"> > <field name="accession" token="AC"/> > <field name="id" token="ID"/> > .... > </field-list> > > You could save subsets of these field lists (alphabets) and pass the > file name your code at run-time. If you want more separation of the > layers of your code you could keep the file handling code in another > class, and simply accept an ArrayList of Field objects as the parameter > to your method. > > Mark > > Roy Park wrote: > > >Hello everyone. > > > >I deal with a number of pseudo EMBL/GenBank formatted sequences, and it > >would be extremely nice (?) to have methods that only attempt to parse out > >specified fields. > > > >The primary reason for this is that, right now, the format.readSequence() > >throws BioException way too frequently for my purpose - i.e. although I only > >need the fields X, Y and Z from each sequence definition, the readSequence() > >throws exception where it finds the field W to be mal-formed, etc. > > > >I see that modified versions of the StreamReader class, the SequenceFormat > >implementing classes, etc. has to be written, which I can do. I'm wondering > >if anyone could suggest a preferred way of passing the desired fields to be > >read. > > > >readXXXXFields(BufferedReader _br, ArrayList(of String) _fieldsToBeParsed).. > >or > >readXXXXFields(BufferedReader _br, String[] _fieldsToBeParsed)..etc. > > > >(I think the readXXXXX(BufferedReader) should be called if the second > >argument is null.) > > > >Any input would be greatly appreciated. (what about the naming of the > >methods - readXXXXPartial()??) > > > >Roy K. Park > >Bioinformatics Data Analyst > >Lexicon Genetics Incorporated > > > > > > > >*************************************************************************** > > The contents of this communication are intended only for the addressee and > >may contain confidential and/or privileged material. If you are not the > >intended recipient, please do not read, copy, use or disclose this > >communication and notify the sender. Opinions, conclusions and other > >information in this communication that do not relate to the official > >business of my company shall be understood as neither given nor endorsed by > >it. > >*************************************************************************** > > > > > >_______________________________________________ > >Biojava-l mailing list - [EMAIL PROTECTED] > >http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > > > --__--__-- > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
