Hi. Anyone that write parsers should take a look at the org.biojava.bio.program.tagvalue package. It has cleaner support for this kind of parsing problem. We should be able to refactor sequenceIO to re-use a lot of this API, giving a much more modular framework for handeling nasty things like reference entries. It assumes that a file can be broken into tags with zero or more values, and that a given value stream may represent itself a sub-document of tag-value pairs; e.g. feature tables are a sub-document of both embl and genbank entries, and features are sub-documents of feature tables - the handlers for these can be re-used easily.
There is currently drop-in support for embl- and genbank-like file formats, and for those of you on jdk1.4, there is an implementation that processes lines into tag/value pairs, or split a value into a list of values based upon a regular expression (very handy). Coupled with the new annotation property objects, it provides a very easy way to build object-trees from text. If anyone does take a look and gets lost, please contact me and I will attempt to make the documentation more explicit. Matthew Cox, Greg wrote: > Unfortunately, not. This is probably the weakest point in BioJava's parsing > right now. > > As you may have noticed, there's a more serious problem with the reference > information. If a reference doesn't contain a field that others do, nothing > is added under that key, causing them to get out of sync. For example: > > REFERENCE > TITLE foo > TITLE bar > AUTHOR wanner > > When this gets turned into a biojava sequence, TITLE has [foo, bar] and > AUTHOR has [wanner] but there's no way to tell which one wanner goes with. > Good luck > > Greg > > > >>-----Original Message----- >>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] >>Sent: Tuesday, May 14, 2002 11:41 AM >>To: [EMAIL PROTECTED] >>Subject: [Biojava-l] RefSeq bioJava parser problem >> >> >>Hi, >> >>Appreciate the responses to the refSeq question. We've been >>able to put togther >>a reliable parser using the example in TestRefSeqPrt. >> >>Have an additional question now. Are there any utility >>methods within bioJava >>that can be used to handle parsed values that are returned by >>bioJava in list >>form. >> >>For example the following value was returned from bioJava for >>a sequence >>annotation with key MEDLINE: >> >> [98127055, 99357812] >> >> >>Another example is the value that was returned from bioJava >>for a feature annotation with key db_xref: >> >> [LocusID:946, MIM:604405] >> >>bioJava does good work in accumulating the information >>together and placing it under a specific annotation, does >>anyone know if there are method to extract listMembers or >>parameter/value pairs already available in bioJava? >> >>thx, >>Dave >> >> >>>>LOCUS NP_000221 167 aa >>> >>>linear PRI 29-JAN-2002 >>> >>>>DEFINITION leptin precursor; leptin (murine obesity >>> >>>homolog); obesity; obesity >>> >>>> (murine homolog, leptin) [Homo sapiens]. >>>>ACCESSION NP_000221 >>>>PID g4557715 >>>>VERSION NP_000221.1 GI:4557715 >>>>DBSOURCE REFSEQ: accession NM_000230.1 >>>>KEYWORDS . >>>>SOURCE human. >>>> ORGANISM Homo sapiens >>>> Eukaryota; Metazoa; Chordata; Craniata; >>> >>>Vertebrata; Euteleostomi; >>> >>>> Mammalia; Eutheria; Primates; Catarrhini; >>> >>>Hominidae; Homo. >>> >>>>REFERENCE 1 (residues 1 to 167) >>>> AUTHORS Friedman JM, Leibel RL, Siegel DS, Walsh J >>> >>and Bahary N. >> >>>> TITLE Molecular mapping of the mouse ob mutation >>>> JOURNAL Genomics 11 (4), 1054-1062 (1991) >>>> MEDLINE 92147101 >>>> PUBMED 1686014 >>>>REFERENCE 2 (residues 1 to 167) >>>> AUTHORS Zhang Y, Proenca R, Maffei M, Barone M, Leopold >>> >>>L and Friedman JM. >>> >>>> TITLE Positional cloning of the mouse obese gene and >>> >>>its human homologue >>> >>>> JOURNAL Nature 372 (6505), 425-432 (1994) >>>> MEDLINE 95075453 >>>> PUBMED 7984236 >>>> REMARK Erratum:[[published erratum appears in Nature 1995 Mar >>>> 30;374(6521):479]] >>>>REFERENCE 3 (residues 1 to 167) >>>> AUTHORS Masuzaki H, Ogawa Y, Isse N, Satoh N, Okazaki >>> >>>T, Shigemoto M, Mori >>> >>>> K, Tamura N, Hosoda K, Yoshimasa Y et al. >>>> TITLE Human obese gene expression. Adipocyte-specific >>> >>>expression and >>> >>>> regional differences in the adipose tissue >>>> JOURNAL Diabetes 44 (7), 855-858 (1995) >>>> MEDLINE 95309556 >>>> PUBMED 7789654 >>>>REFERENCE 4 (residues 1 to 167) >>>> AUTHORS Green ED, Maffei M, Braden VV, Proenca R, >>> >>>DeSilva U, Zhang Y, Chua >>> >>>> SC Jr, Leibel RL, Weissenbach J and Friedman JM. >>>> TITLE The human obese (OB) gene: RNA expression >>> >>>pattern and mapping on >>> >>>> the physical, cytogenetic, and genetic maps of >>> >>>chromosome 7 >>> >>>> JOURNAL Genome Res. 5 (1), 5-12 (1995) >>>> MEDLINE 96352898 >>>> PUBMED 8717050 >>>>REFERENCE 5 (residues 1 to 167) >>>> AUTHORS Isse N, Ogawa Y, Tamura N, Masuzaki H, Mori K, >>> >>>Okazaki T, Satoh N, >>> >>>> Shigemoto M, Yoshimasa Y, Nishi S et al. >>>> TITLE Structural organization and chromosomal >>> >>>assignment of the human >>> >>>> obese gene >>>> JOURNAL J. Biol. Chem. 270 (46), 27728-27733 (1995) >>>> MEDLINE 96070903 >>>> PUBMED 7499240 >>>>REFERENCE 6 (residues 1 to 167) >>>> AUTHORS Gong,D.W., Bi,S., Pratley,R.E. and Weintraub,B.D. >>>> TITLE Genomic structure and promoter analysis of the >>> >>>human obese gene >>> >>>> JOURNAL J. Biol. Chem. 271 (8), 3971-3974 (1996) >>>> MEDLINE 96223958 >>>>REFERENCE 7 (residues 1 to 167) >>>> AUTHORS Niki T, Mori H, Tamori Y, Kishimoto-Hashirmoto >>> >>>M, Ueno H, Araki S, >>> >>>> Masugi J, Sawant N, Majithia HR, Rais N et al. >>>> TITLE Human obese gene: molecular screening in >>> >>>Japanese and Asian Indian >>> >>>> NIDDM patients associated with obesity >>>> JOURNAL Diabetes 45 (5), 675-678 (1996) >>>> MEDLINE 96198511 >>>> PUBMED 8621021 >>>>REFERENCE 8 (residues 1 to 167) >>>> AUTHORS Comuzzie,A.G., Hixson,J.E., Almasy,L., >>> >>>Mitchell,B.D., Mahaney,M.C., >>> >>>> Dyer,T.D., Stern,M.P., MacCluer,J.W. and Blangero,J. >>>> TITLE A major quantitative trait locus determining >>> >>>serum leptin levels >>> >>>> and fat mass is located on human chromosome 2 >>>> JOURNAL Nat. Genet. 15 (3), 273-276 (1997) >>>> MEDLINE 97207647 >>>> PUBMED 9054940 >>>>REFERENCE 9 (residues 1 to 167) >>>> AUTHORS Clement,K., Vaisse,C., Lahlou,N., Cabrol,S., >>> >>Pelloux,V., >> >>>> Cassuto,D., Gourmelen,M., Dina,C., Chambaz,J., >>> >>>Lacorte,J.M., >>> >>>> Basdevant,A., Bougneres,P., Lebouc,Y., >>> >>>Froguel,P. and Guy-Grand,B. >>> >>>> TITLE A mutation in the human leptin receptor gene >>> >>>causes obesity and >>> >>>> pituitary dysfunction >>>> JOURNAL Nature 392 (6674), 398-401 (1998) >>>> MEDLINE 98196670 >>>> PUBMED 9537324 >>>>REFERENCE 10 (residues 1 to 167) >>>> AUTHORS Friedman,J.M. and Halaas,J.L. >>>> TITLE Leptin and the regulation of body weight in mammals >>>> JOURNAL Nature 395 (6704), 763-770 (1998) >>>> MEDLINE 99010835 >>>>COMMENT REVIEWED REFSEQ: This record has been curated >>> >>>by NCBI staff. The >>> >>>> reference sequence was derived from U43653.1. >>>> Summary: This gene is similar to the mouse >>> >>>obesity gene (ob). The >>> >>>> protein encoded by this gene is secreted by >>> >>>white adipocytes. In >>> >>>> the mouse study, mutations in this gene are >>> >>>linked to severe and >>> >>>> morbid obesity. >>>>FEATURES Location/Qualifiers >>>> source 1..167 >>>> /organism="Homo sapiens" >>>> /db_xref="taxon:9606" >>>> /chromosome="7" >>>> /map="7q31.3" >>>> Protein 1..167 >>>> /product="leptin precursor" >>>> /note="leptin (murine obesity >>> >>>homolog); obesity (murine >>> >>>> homolog, leptin)" >>>> sig_peptide 1..21 >>>> Region 22..167 >>>> /region_name="Leptin" >>>> /note="Leptin" >>>> /db_xref="CDD:pfam02024" >>>> mat_peptide 22..167 >>>> /product="leptin" >>>> CDS 1..167 >>>> /gene="LEP" >>>> /coded_by="NM_000230.1:57..560" >>>> /db_xref="LocusID:3952" >>>> /db_xref="MIM:164160" >>>>ORIGIN >>>> 1 mhwgtlcgfl wlwpylfyvq avpiqkvqdd tktliktivt >>> >>>rindishtqs vsskqkvtgl >>> >>>> 61 dfipglhpil tlskmdqtla vyqqiltsmp srnviqisnd >>> >>>lenlrdllhv lafskschlp >>> >>>> 121 wasgletlds lggvleasgy stevvalsrl qgslqdmlwq ldlspgc >>>>// >>>> >>>>_______________________________________________ >>>>Biojava-l mailing list - [EMAIL PROTECTED] >>>>http://biojava.org/mailman/listinfo/biojava-l >>>> >>> >>> >>> >>>_______________________________________________ >>>Biojava-l mailing list - [EMAIL PROTECTED] >>>http://biojava.org/mailman/listinfo/biojava-l >>> >> >>_______________________________________________ >>Biojava-l mailing list - [EMAIL PROTECTED] >>http://biojava.org/mailman/listinfo/biojava-l >> >>_______________________________________________ >>Biojava-l mailing list - [EMAIL PROTECTED] >>http://biojava.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
