I have given Mathieu my Phred classes which may be able to form the basis of
biojava API that could be used by your .ace parser

I attach it here for your interest


Mark Schreiber
Bioinformatics
AgResearch Invermay
PO Box 50034
Mosgiel
New Zealand

PH: +64 3 489 9175

 

> -----Original Message-----
> From: David Waring [mailto:[EMAIL PROTECTED]]
> Sent: Friday, June 15, 2001 11:11 AM
> To: Wiepert, Mathieu; [EMAIL PROTECTED]
> Subject: RE: [Biojava-l] Reading consensus sequence from 
> phred/phrap ace
> files
> 
> 
> We have a full .ace parser. It was not written to the biojava API, so
> sequences are strings. Our parser (which I have not worked 
> with) uses Jlex
> and Cup, and parses the entire .ace file into a really big 
> Object with all
> the data in it, in a structure just like the .ace file 
> itself. For this
> reason it is not particulary fast. Anyone familiar with Jlex 
> and Cup should
> be able to modify it to ignore parts that they were not interested in.
> 
> While you may not want everything in the file (and there is 
> alot) perhaps a
> more complete data structure is in order. In fact if I am not 
> mistaken,
> there really is no such thing as a consensus sequence in an 
> .ace file. The
> file consists of a list of contig sequences, the individual 
> reads, and a
> bunch more data. In a finished assembly project the 
> "consensus sequence" is
> just the longest contig. The other contigs may be junk. In an assembly
> project that is not complete there are many "good" contigs and some
> potential junk.
> 
> I would think that a structure that contained collection of 
> all the contigs
> would be in order. Methods could then allow getting the 
> largest removing
> sequences by size limits etc.
> 
> Will Gillett is the author. He says he has been thinking 
> about modifying it
> to fit the biojava API. If you could define a spec for the output data
> structure, he would be willing to modify his code to parse 
> the .ace file
> into it. Otherwise we would gladly send you the source code.
> 
> David
> University of Washington Genome Center
> 
> 
> > Has anyone written something that fills out a sequence from 
> the consensus
> > sequence found in the .ace files of phred/phrap?
> >
> > If not, I will be writing one, I was thinking of doing something
> > like being
> > able to do
> >
> > BufferedReader reader = new BufferedReader( new 
> FileReader(phredFile));
> > SequenceIterator si = SeqIOTools.readPhred(reader);
> > Sequence sequence = si.getConsensusSequence();
> >
> > Don't really need a sequence iterator I suppose, there is only
> > one consensus
> > in the file, though there are all the sample sequences in 
> the file.  And I
> > don't want to add a method to the sequence iterator either.  SO
> > perhaps some
> > sort of sequencebuilder child or factory method?  Anyway, 
> please advise...
> >
> > -Mat
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  [EMAIL PROTECTED]
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> _______________________________________________
> Biojava-l mailing list  -  [EMAIL PROTECTED]
> http://biojava.org/mailman/listinfo/biojava-l
> 

Phred.zip

Reply via email to