The biojavax parser uses regular expressions to parse these lines. I will need to check what needs changing in these regex's to allow parsing of these files.
Thanks for your testing! - Mark "Jolyon Holdstock" <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 03/08/2006 06:47 PM To: <biojava-l@biojava.org> cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BiojavaX EmblFormat Hi, I am using the new format parsers in BioJavaX. GenbankFormat is great, but I am having some trouble with the EMBLFormat class. I have downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I don't believe it is parsing properly. My code is as follows: String fileName = "path to file"; try { RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new FileReader(fileName)), null); while (rsi.hasNext()) { RichSequence seq = rsi.nextRichSequence(); System.out.println(seq.getURN()); System.out.println(seq.length()); System.out.println(seq.getAccession()); } } catch (IOException IOE) { System.out.println("BioJava IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioJavaX BioException " + BIOE); BIOE.printStackTrace(); } The BioJava parser will read it. seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); //works I checked the web CVS and the EMBLFormat class is 3 months old so I am using the most recent version. I have pasted a snippet of the sequence file that retains the problems below. The errors are: The ID line isn't parsed because of 'genomic' being there - deleting it removes the problem org.biojava.bio.BioException: Could not read sequence Caused by: org.biojava.bio.seq.io.ParseException: Bad ID line found: U00096 standard; circular genomic DNA; PRO; 4639675 BP. ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. //fails ID U00096 standard; circular DNA; PRO; 4639675 BP. //works There is a problem with the RX tag which fails with output: org.biojava.bio.BioException: Could not read sequence Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352) Replacing RX DOI; 10.1126/science.277.5331.1453. with removes the error XX RX DOI; 10.1126/science.277.5331.1453. There is an error with parsing the authors org.biojava.bio.BioException: Could not read sequence Caused by: java.lang.IllegalArgumentException: Authors string cannot be null at org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead er.java:100) I am looking at the code trying to see where the problems are but suspect that it may be beyond me. So if anybody has some experience of this I would welcome their input. Thanks, Jolyon This is a snippet of the code that reproduces the errors in my hands. ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. XX AC U00096; AE000111-AE000510; XX SV U00096.2 XX DT 23-FEB-2006 (Rel. 86, Created) DT 06-MAR-2006 (Rel. 87, Last updated, Version 3) XX DE Escherichia coli K-12 MG1655, complete genome. XX KW . XX OS Escherichia coli K12 OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; OC Enterobacteriaceae; Escherichia. XX RN [1] RP 1-4639675 RX DOI; 10.1126/science.277.5331.1453. RX PUBMED; 9278503. RA Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V., Riley M., RA Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J., RA Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao Y.; RT "The complete genome sequence of Escherichia coli K-12"; RL Science 277(5331):1453-1474(1997). XX RN [2] RP 1-4639675 RX DOI; 10.1093/nar/gkj150. RX PUBMED; 16397293. RA Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R., Chaudhuri R.R., RA Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna N.T., RA Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R., RA Wishart D., Wanner B.L.; RT "Escherichia coli K-12: a cooperatively developed annotation RT snapshot--2005"; RL (er) Nucleic Acids Res. 34 (1), 1-9 (2006) XX RN [3] RC Woods Hole, Mass., on 14-18 November 2003 (sequence corrections) RP 1-4639675 RA Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner J.D., RA Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley M., RA Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.; RT "Workshop on Annotation of Escherichia coli K-12"; RL Unpublished. XX RN [4] RC ASAP download 10 June 2004 (annotation updates) RP 1-4639675 RA Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst J., RA Hu J.C., Riley M., Rudd K.E., Serres M.H.; RT "ASAP: Escherichia coli K-12 strain MG1655 version m56"; RL Unpublished. XX RN [5] RC GenBank accessions AG613214 to AG613378 (sequence corrections) RP 1-4639675 RA Hayashi K., Morooka N., Mori H., Horiuchi T.; RT "A more accurate sequence comparison between genomes of Escherichia coli RT K12 W3110 and MG1655 strains"; RL Unpublished. XX RN [6] RC GenBank accession AY605712 (sequence corrections) RP 1-4639675 RA Perna N.T.; RT "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic sequence RT correction"; RL Unpublished. XX RN [7] RP 1-4639675 RA Rudd K.E.; RT "A manual approach to accurate translation start site annotation: an E. RT coli K-12 case study"; RL Unpublished. XX RN [8] RP 1-4639675 RA Blattner F.R., Plunkett G. III.; RT ; RL Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [9] RP 1-4639675 RA Blattner F.R., Plunkett G. III.; RT ; RL Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [10] RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [11] RC Sequence update by submitter RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [12] RC Protein updates by submitter RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX DR EMBL-TPA; BR000242. XX FH Key Location/Qualifiers FH FT source 1..4639675 FT /organism="Escherichia coli K12" FT /strain="K-12" FT /sub_strain="MG1655" FT /mol_type="genomic DNA" FT /db_xref="taxon:83333" FT gene 190..255 FT /gene="thrL" FT /locus_tag="b0001" FT /note="synonyms: ECK0001, JW4367" FT CDS 190..255 FT /codon_start=1 FT /transl_table=11 FT /gene="thrL" FT /locus_tag="b0001" FT /product="thr operon leader peptide" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="leader; Amino acid biosynthesis: Threonine" FT /note="go_process: threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73112.1" FT /translation="MKRISTTITTTITITTGNGAG" FT gene 337..2799 FT /gene="thrA" FT /locus_tag="b0002" FT /note="synonyms: Hs, thrD, ECK0002, JW0001" FT CDS 337..2799 FT /codon_start=1 FT /transl_table=11 FT /gene="thrA" FT /locus_tag="b0002" FT /product="fused aspartokinase I and homoserine FT dehydrogenase I" FT /function="1.5.1.21 metabolism; building block FT biosynthesis; amino acids; homoserine" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="1.1.1.3" FT /EC_number="2.7.2.4" FT /note="bifunctional: aspartokinase I (N-terminal); FT homoserine dehydrogenase I (C-terminal); go_component: FT cytoplasm [goid 0005737]; go_process: threonine FT biosynthesis [goid 0009088]; go_process: homoserine FT biosynthesis [goid 0009090]" FT /protein_id="AAC73113.1" FT /translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN FT HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV FT LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES FT TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC FT CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL FT IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS FT RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII FT SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM FT LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN FT LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT FT PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI FT LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL FT ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG FT VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR FT TLSWKLGV" FT gene 2801..3733 FT /gene="thrB" FT /locus_tag="b0003" FT /note="synonyms: ECK0003, JW0002" FT CDS 2801..3733 FT /codon_start=1 FT /transl_table=11 FT /gene="thrB" FT /locus_tag="b0003" FT /product="homoserine kinase" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="2.7.1.39" FT /note="go_component: cytoplasm [goid 0005737]; go_process: FT threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73114.1" FT /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS FT LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV FT AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ FT QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA FT AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD FT WLGKNYLQNQEGFVHICRLDTAGARVLEN" FT gene 3734..5020 FT /gene="thrC" FT /locus_tag="b0004" FT /note="synonyms: ECK0004, JW0003" FT CDS 3734..5020 FT /codon_start=1 FT /transl_table=11 FT /gene="thrC" FT /locus_tag="b0004" FT /product="threonine synthase" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="4.2.3.1" FT /note="go_component: cytoplasm [goid 0005737]; go_process: FT threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73115.1" FT /translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE FT MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL FT AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS FT PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ FT ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR FT FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM FT RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE FT LAERADLPLLSHNLPADFAALRKLMMNHQ" XX SQ Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0 other; agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc 60 tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg 120 tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac 180 acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt 240 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg 300 cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt 360 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc 420 aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg 480 gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa 540 cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg 600 caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt 660 agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa 720 atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc 780 gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct 840 gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca 900 ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac 960 tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac 1020 gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg 1080 atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc 1140 accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct 1200 caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc 1260 atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg 1320 gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg 1380 attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg 1440 cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag 1500 ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc 1560 ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc 1620 gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg 1680 accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg 1740 tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa 1800 agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct 1860 ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc 1920 aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac 1980 ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg 2040 cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac 2100 taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac 2160 gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa 2220 ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac 2280 gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg 2340 gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt 2400 gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag 2460 tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc 2520 tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat 2580 attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg 2640 ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg 2700 ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct 2760 gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc 2820 ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt 2880 tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa 2940 caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca 3000 gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga 3060 aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct 3120 gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat 3180 gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt 3240 tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg 3300 gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc 3360 cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct 3420 ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa 3480 agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca 3540 ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt 3600 cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta 3660 cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt 3720 actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc 3780 gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc 3840 ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc 3900 gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt 3960 gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct 4020 ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca 4080 aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga 4140 taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct 4200 ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa 4260 tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc 4320 gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat 4380 cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga 4440 gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg 4500 tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa 4560 cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac 4620 gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt 4680 ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac 4740 gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt 4800 agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac 4860 cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct 4920 gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga 4980 ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat 5040 caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg 5100 acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga 5160 ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata 5220 aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt 5280 cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat 5340 aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg 5400 gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc 5460 gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca 5520 tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg 5580 gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg 5640 caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg 5700 tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt 5760 aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag 5820 accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag 5880 gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc 5940 atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt 6000 gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg 6060 gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa 6120 gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc 6180 ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc 6240 cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc 6300 gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa 6360 ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg 6420 gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt 6480 tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga 6540 aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc 6600 aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac 6660 cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat 6720 atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa 6780 ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt 6840 gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag 6900 ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg 6960 aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct 7020 tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc 7080 tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc 7140 cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata 7200 tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat 7260 gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat 7320 tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt 7380 gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa 7440 agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata 7500 ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc 7560 catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg 7620 tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca 7680 aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct 7740 acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg 7800 tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa 7860 tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc 7920 ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg 7980 tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg 8040 tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca 8100 acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc 8160 gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt 8220 taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag 8280 tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca 8340 acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg 8400 ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg 8460 acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa 8520 ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc 8580 tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt 8640 ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc 8700 tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga 8760 tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt 8820 acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag 8880 agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg 8940 aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga 9000 // Jolyon Holdstock Ph.D. Senior Computational Biologist, Oxford Gene Technology (Ops) Ltd. Begbroke Business and Science Park Sandy Lane, Yarnton Oxford, OX5 1PF Tel: 01865 309699 Fax: 01865 842116 Confidentiality Notice: The contents of this email from the Oxford Gene Technology Group of Companies are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l