Andreas, I've been using biojava to gather sequence data from structure files for an internal project. My intent was to test the limitations of my work (hence files similar to 470D), but came across this behavior in biojava.
It is not critical to obtain this particular mapping since it can be derived from the atom records. However, I didn't understand why the SEQRES list would be empty and was looking for clarification. Is it because the chain is RNA and the empty list prevents the unsupported alignment of RNA records? Regards, Steve -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Andreas Prlic Sent: Monday, November 29, 2010 6:36 PM To: Steve Darnell Cc: [email protected] Subject: Re: [Biojava-l] PDBFileParser question using PDBID 470D Hi Steve, as you already are saying, this is an "exotic" sequence, in the sense that this is an RNA. The alignments of the SEQRES records for RNA currently is not supported as of yet. Can you explain a bit more what you are doing and why you need this mapping in this case? Thanks, Andreas On Mon, Nov 29, 2010 at 12:51 PM, Steve Darnell <[email protected]> wrote: > Greetings, > > After parsing PDBID 470D with biojava-3.0-alpha5, Chain A returns an > empty SEQRES sequence (Chain.getSeqResSequence) and empty SEQRES group > list (Chain.getSeqResGroups) but the one-letter ATOM sequence is > properly translated and the ATOM group list contains the appropriate > number of groups (LoadChemCompInfo set to true). > > This is an exotic sequence, but my expectation is that the SEQRES group > list would have members in it (and one-letter sequence translated if > LoadChemCompInfo is true). Am I mistaken and the current behavior is > the intended result? > > Best regards, > Steve Darnell > > -- > SEQRES records exist in 470D: > > SEQRES 1 A 12 C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 > > SEQRES 1 B 12 C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 > > > > Sample println output (ln 1 record type, ln 2 get${TYPE}Sequence, ln 3 > get${TYPE}Groups): > > SEQRES > '' > [] > > ATOM > 'CGCGAAUUCGCG' > [PDB: C43 1 trueatoms: 21, PDB: G48 2 trueatoms: 27, PDB: C43 3 > trueatoms: 24, ...] > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
