Short answer - no, not directly. Longer answer - if you can write some code to snip out the Loc string from the FASTA description line then there is existing code which can convert the snipped Loc string into a RichLocation, which you can then apply to the parsed FASTA sequence in order to extract the required location. The Loc string parser is GenbankLocationParser, part of the biojavax packages. This assumes that the Loc string conforms to Genbank format location definitions.
cheers, Richard On Mon, 2009-05-11 at 11:05 +0100, JP wrote: > Hi there at Biojava, > > I have two FASTA files - one containing amino acid sequences and the other > containing dna sequences. > > In the AA FASTA file I have something like : > > >FBpp0077713 type=protein; > loc=2L:join(384551..384894,385701..385746,386308..386576,386703..387270); > ID=FBpp0077713; name=al-PA; parent=FBgn0000061,FBtr0078053; > dbxref=FlyBase:FBpp0077713,GB_protein:AAF51505.1,GB_protein:AAF51505,FlyBase_Annotation_IDs:CG3935-PA,REFSEQ:NP_722629; > MD5=64a866db3e2913b97a2158c2de9d02f6; length=408; release=r5.9; > species=Dmel; > MGISEEIKLEELPQEAKLAHPDAVVLVDRAPGSSAASAGAALTVSMSVSG > GAPSGASGASGGTNSPVSDGNSDCEADEYAPKRKQRRYRTTFTSFQLEEL... > etc etc etc > > I would like to parse this header line in particular the loc attribute and > extract it from the entry in the DNA FASTA file (so I get the genomic data > for the protein) > > >FBgn0000061 type=gene; loc=2L:378116..387439; ID=FBgn0000061; name=al; > dbxref=FlyBase:FBgn0000061,FlyBase:FBan0003935,FlyBase_Annotation_IDs:CG3935,GB:AE003589,GB_protein:AAF51505,GB:AY121696,GB_protein:AAM52023,GB:BI485174,GB:CZ486795,GB:L08401,GB_protein:AAA28840,UniProt/Swiss-Prot:Q06453,INTERPRO:IPR000047,INTERPRO:IPR001356,INTERPRO:IPR003654,INTERPRO:IPR009057,INTERPRO:IPR012287,bdgpinsituexpr:al,dedb:5830,drsc:FBgn0000061,flight:FBgn0000061,flyatlas:FBgn0000061,flyexpress:FBgn0000061,flygrid:59464,flymine:FBgn0000061,geo:FBgn0000061,hdri:FBgn0000061,if:/gene/aristal.htm,orthologs:ensANOGA:ENSANGP00000011877,orthologs:ensBOSTA:ENSBTAP00000015907,orthologs:ensCANFA:ENSCAFP00000009888,orthologs:ensGALGA:ENSGALP00000005255,orthologs:ensHOMSA:ENSP00000298420,orthologs:ensMACMU:ENSMMUP00000007349,orthologs:ensMONDO:ENSMODP00000008388,orthologs:ensPANTR:ENSPTRP00000004281,orthologs:ensRATNO:ENSRNOP00000027186,orthologs:ensTETNI:GSTENP00015517001,orthologs:graORYSA:Q6YYB8,orthologs:graORYSA:Q8W0T5,orthologs:modCAEEL:WBGene00044330,orthologs:mod! DA! > NRE:ZDB-GENE-990415-15,orthologs:modMUSMU:MGI:1097716,panther:FBgn0000061; > cyto_range=21C1-21C1; gbunit=AE014134; MD5=0f5568cf13aeb2c7076f11b1ce3d6b2f; > length=9324; release=r5.9; species=Dmel; > GTAGTTTGCTGCCGGCTCTGGAACAGCCCGGTCATCTCGTCGCGTTCGGT > TCCGATTCCGATTCGAATAGTCGAGCTGGGGATACATTGTTGTTTCCGGG > etc etc etc > > I understand this is not exactly conventional, but does biojava support the > parsing of the loc attribute ? (join, complement etc.) > > Many Thanks > JP > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: [email protected] http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
