Hi, Has anyone used bioJava to parse a swissProt record. I've attached an example below -- it's format looks much different from either genbank or refseq.
thx, Dave ID 100K_RAT STANDARD; PRT; 889 AA. AC Q62671; DT 01-NOV-1997 (Rel. 35, Created) DT 01-NOV-1997 (Rel. 35, Last sequence update) DT 16-OCT-2001 (Rel. 40, Last annotation update) DE 100 kDa protein (ENZYME: 6.3.2.-). OS Rattus norvegicus (Rat). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus. OX NCBI_TaxID=10116; RN [1] RP SEQUENCE FROM N.A. RC STRAIN=WISTAR; TISSUE=Testis; RX MEDLINE=92253337: PubMed=1533713; RA Mueller D., Rehbein M., Baumeister H., Richter D.; RT "Molecular characterization of a novel rat protein structurally RT related to poly(A) binding proteins and the 70K protein of the U1 RT small nuclear ribonucleoprotein particle (snRNP)."; RL Nucleic Acids Res. 20:1471-1475(1992). RN [2] RP ERRATUM. RA Mueller D., Rehbein M., Baumeister H., Richter D.; RL Nucleic Acids Res. 20:2624-2624(1992). CC -!- FUNCTION: E3 UBIQUITIN-PROTEIN LIGASE WHICH ACCEPTS UBIQUITIN FROM CC AN E2 UBIQUITIN-CONJUGATING ENZYME IN THE FORM OF A THIOESTER AND CC THEN DIRECTLY TRANSFERS THE UBIQUITIN TO TARGETED SUBSTRATES (BY CC SIMILARITY). THIS PROTEIN MAY BE INVOLVED IN MATURATION AND/OR CC POST-TRANSCRIPTIONAL REGULATION OF MRNA. CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT CC IN LIVER, KIDNEY, LUNG AND BRAIN. CC -!- DEVELOPMENTAL STAGE: IN EARLY POST-NATAL LIFE, EXPRESSION IN CC THE TESTIS INCREASES TO REACH A MAXIMUM AROUND DAY 28. CC -!- MISCELLANEOUS: A CYSTEINE RESIDUE IS REQUIRED FOR CC UBIQUITIN-THIOLESTER FORMATION. CC -!- SIMILARITY: A CENTRAL REGION (AA 485-514) IS SIMILAR TO THE CC C-TERMINAL DOMAINS OF MAMMALIAN AND YEAST POLY (A) RNA BINDING CC PROTEINS (PABP). CC -!- SIMILARITY: CONTAINS MIXED-CHARGE DOMAINS SIMILAR TO RNA-BINDING CC PROTEINS. CC -!- SIMILARITY: CONTAINS 1 HECT-TYPE E3 UBIQUITIN-PROTEIN LIGASE CC DOMAIN. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ CC or send an email to [EMAIL PROTECTED]). CC -------------------------------------------------------------------------- DR EMBL: X64411 DR InterPro: IPR000569 DR IPR002004 DR Pfam: PF00632 DR PF00658 DR SMART: SM00119 DR SM00517 DR PROSITE: PS50237 KW Ubiquitin conjugation; Ligase. FT DOMAIN 77 88 ASP/GLU-RICH (ACIDIC). FT DOMAIN 127 150 PRO-RICH. FT DOMAIN 420 439 ARG/GLU-RICH (MIXED CHARGE). FT DOMAIN 448 457 ARG/ASP-RICH (MIXED CHARGE). FT DOMAIN 485 514 PABP-LIKE. FT DOMAIN 579 590 ASP/GLU-RICH (ACIDIC). FT DOMAIN 786 889 HECT. FT DOMAIN 827 847 PRO-RICH. FT BINDING 858 858 UBIQUITIN (BY SIMILARITY). SQ SEQUENCE 889 AA; 100368 MW; ABD7E3CD53961B78 CRC64; MMSARGDFLN YALSLMRSHN DEHSDVLPVL DVCSLKHVAY VFQALIYWIK AMNQQTTLDT PQLERKRTRE LLELGIDNED SEHENDDDTS QSATLNDKDD ESLPAETGQN HPFFRRSDSM TFLGCIPPNP FEVPLAEAIP LADQPHLLQP NARKEDLFGR PSQGLYSSSA GSGKCLVEVT MDRNCLEVLP TKMSYAANLK NVMNMQNRQK KAGEDQSMLA EEADSSKPGP SAHDVAAQLK SSLLAEIGLT ESEGPPLTSF RPQCSFMGMV ISHDMLLGRW RLSLELFGRV FMEDVGAEPG SILTELGGFE VKESKFRREM EKLRNQQSRD LSLEVDRDRD LLIQQTMRQL NNHFGRRCAT TPMAVHRVKV TFKDEPGEGS GVARSFYTAI AQAFLSNEKL PNLDCIQNAN KGTHTSLMQR LRNRGERDRE REREREMRRS SGLRAGSRRD RDRDFRRQLS IDTRPFRPAS EGNPSDDPDP LPAHRQALGE RLYPRVQAMQ PAFASKITGM LLELSPAQLL LLLASEDSLR ARVEEAMELI VAHGRENGAD SILDLGLLDS SEKVQENRKR HGSSRSVVDM DLDDTDDGDD NAPLFYQPGK RGFYTPRPGK NTEARLNCFR NIGRILGLCL LQNELCPITL NRHVIKVLLG RKVNWHDFAF FDPVMYESLR QLILASQSSD ADAVFSAMDL AFAVDLCKEE GGGQVELIPN GVNIPVTPQN VYEYVRKYAE HRMLVVAEQP LHAMRKGLLD VLPKNSLEDL TAEDFRLLVN GCGEVNVQML ISFTSFNDES GENAEKLLQF KRWFWSIVER MSMTERQDLV YFWTSSPSLP ASEEGFQPMP SITIRPPDDQ HLPTANTCIS RLYVPLYSSK QILKQKLLLA IKTKNFGFV // _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
