Re: [Biojava-l] FASTA header, loc attribute question

Richard Holland Mon, 11 May 2009 04:52:36 -0700

Short answer - no, not directly.

Longer answer - if you can write some code to snip out the Loc string
from the FASTA description line then there is existing code which can
convert the snipped Loc string into a RichLocation, which you can then
apply to the parsed FASTA sequence in order to extract the required
location. The Loc string parser is GenbankLocationParser, part of the
biojavax packages. This assumes that the Loc string conforms to Genbank
format location definitions.


cheers,
Richard

On Mon, 2009-05-11 at 11:05 +0100, JP wrote:
> Hi there at Biojava,
> 
> I have two FASTA files - one containing amino acid sequences and the other
> containing dna sequences.
> 
> In the AA FASTA file I have something like :
> 
> >FBpp0077713 type=protein;
> loc=2L:join(384551..384894,385701..385746,386308..386576,386703..387270);
> ID=FBpp0077713; name=al-PA; parent=FBgn0000061,FBtr0078053;
> dbxref=FlyBase:FBpp0077713,GB_protein:AAF51505.1,GB_protein:AAF51505,FlyBase_Annotation_IDs:CG3935-PA,REFSEQ:NP_722629;
> MD5=64a866db3e2913b97a2158c2de9d02f6; length=408; release=r5.9;
> species=Dmel;
> MGISEEIKLEELPQEAKLAHPDAVVLVDRAPGSSAASAGAALTVSMSVSG
> GAPSGASGASGGTNSPVSDGNSDCEADEYAPKRKQRRYRTTFTSFQLEEL...
> etc etc etc
> 
> I would like to parse this header line in particular the loc attribute and
> extract it from the entry in the DNA FASTA file (so I get the genomic data
> for the protein)
> 
> >FBgn0000061 type=gene; loc=2L:378116..387439; ID=FBgn0000061; name=al;
> dbxref=FlyBase:FBgn0000061,FlyBase:FBan0003935,FlyBase_Annotation_IDs:CG3935,GB:AE003589,GB_protein:AAF51505,GB:AY121696,GB_protein:AAM52023,GB:BI485174,GB:CZ486795,GB:L08401,GB_protein:AAA28840,UniProt/Swiss-Prot:Q06453,INTERPRO:IPR000047,INTERPRO:IPR001356,INTERPRO:IPR003654,INTERPRO:IPR009057,INTERPRO:IPR012287,bdgpinsituexpr:al,dedb:5830,drsc:FBgn0000061,flight:FBgn0000061,flyatlas:FBgn0000061,flyexpress:FBgn0000061,flygrid:59464,flymine:FBgn0000061,geo:FBgn0000061,hdri:FBgn0000061,if:/gene/aristal.htm,orthologs:ensANOGA:ENSANGP00000011877,orthologs:ensBOSTA:ENSBTAP00000015907,orthologs:ensCANFA:ENSCAFP00000009888,orthologs:ensGALGA:ENSGALP00000005255,orthologs:ensHOMSA:ENSP00000298420,orthologs:ensMACMU:ENSMMUP00000007349,orthologs:ensMONDO:ENSMODP00000008388,orthologs:ensPANTR:ENSPTRP00000004281,orthologs:ensRATNO:ENSRNOP00000027186,orthologs:ensTETNI:GSTENP00015517001,orthologs:graORYSA:Q6YYB8,orthologs:graORYSA:Q8W0T5,orthologs:modCAEEL:WBGene00044330,orthologs:mod!
 DA!
>  NRE:ZDB-GENE-990415-15,orthologs:modMUSMU:MGI:1097716,panther:FBgn0000061;
> cyto_range=21C1-21C1; gbunit=AE014134; MD5=0f5568cf13aeb2c7076f11b1ce3d6b2f;
> length=9324; release=r5.9; species=Dmel;
> GTAGTTTGCTGCCGGCTCTGGAACAGCCCGGTCATCTCGTCGCGTTCGGT
> TCCGATTCCGATTCGAATAGTCGAGCTGGGGATACATTGTTGTTTCCGGG
> etc etc etc
> 
> I understand this is not exactly conventional, but does biojava support the
> parsing of the loc attribute ? (join, complement etc.)
> 
> Many Thanks
> JP
> 
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: [email protected]
http://www.eaglegenomics.com/


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Re: [Biojava-l] FASTA header, loc attribute question

Reply via email to