RE: [Biojava-l] Re: genbank contig stuff

Schreiber, Mark Mon, 07 Jul 2003 16:18:28 -0700

There seems to be lots of ways to think about contigs. One nice way is a Markov Chain 
(although that is more of a consensus). An alternative is to treat the contig as a 
collection of sequences and some associated information about the locations of the 
sequences in the contig and what the consensus should look like. We do this with an 
XML description of the contig.
 
I feel that all the parts needed are in biojava and it would be good to have a fairly 
abstract Contig object that holds the information required. When the needed sequenceDB 
is available then a view could be made to the consensus (a Sequence object) or a view 
to the Alignment or even a view to a Markov Chain. When quality info is available a 
Sequence over the Phred alphabet could be produced. In this way a Contig object is not 
a Sequence an Alignment or a Markov chain but information in it could be used to 
produce all three.
 
Anyone want to code that up :)
 
- Mark


        -----Original Message----- 
        From: Greg Cox [mailto:[EMAIL PROTECTED] 
        Sent: Tue 8/07/2003 3:19 a.m. 
        To: Matthew Pocock 
        Cc: biojava-l 
        Subject: RE: [Biojava-l] Re: genbank contig stuff
        
        

        We looked at this a while back, and I suspect this isn't a problem BioJava can 
solve.   
        
        If we treat it as a sequence, one option is try to assemble it.  If BioJava 
assembles the sequence, it has to know where to get the composing sequences.  This 
implies some sort of database backing to parse the contig sequences, which seems a bit 
excessive.  If all you want is the features, we could create a dummy sequence of 
ambiguous nucleotides of the proper length, and attach the features to that.  At that 
point though, I think it makes more sense to create a feature holder instead of 
pretending it's a real sequence.  Which segues into...
        
        The other option is to treat a contig as a new kind of beast, not a sequence.  
I don't know what this beast would look like; it has to be a feature holder, probably 
annotatable, and then what?  Aesthetically I'm not sure this makes sense either, after 
all, a contig sequence is still a sequence.
        
        The ray of light is that most (all?) contigs are avilable in an expanded form 
also.  That's been enough for us to avoid grappling with this bull so far. 
        
        Greg
        
        -----Original Message-----
        From: [EMAIL PROTECTED]
        [mailto:[EMAIL PROTECTED] Behalf Of Matthew Pocock
        Sent: Thursday, June 26, 2003 2:58 PM
        To: Matthew Pocock
        Cc: biojava-l
        Subject: [Biojava-l] Re: genbank contig stuff
        
        
        Sory - I fired that off without thinking much.
        
        I just downloaded the genbank file NT_010783 from the ncbi. Our parsers
        spewed lots of errors about features not being within the range 1..0,
        and after a little poking arround in the code, I found that a zero
        length sequence was being generated. In despiration, I looked at the
        physical genbank file. Instead of sequences, it contains a CONTIG
        section with a single big join() describing how to build it from other
        entries.
        
        Has anybody modified our genbank parser to process entries like this? To
        be honest, I'm not quite sure where to start.
        
        Matthew
        
        _______________________________________________
        Biojava-l mailing list  -  [EMAIL PROTECTED]
        http://biojava.org/mailman/listinfo/biojava-l
        
        
        _______________________________________________
        Biojava-l mailing list  -  [EMAIL PROTECTED]
        http://biojava.org/mailman/listinfo/biojava-l
        


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

RE: [Biojava-l] Re: genbank contig stuff

Reply via email to