Another problem. Some Restriction Enzymes have more than one recognition site. Usually this can be notated by using ambiguous symbols, but some for restriction enzymes this is not possible because in some cases the ambiguous symbols rely on each other.
Usually an ambiguous symbol is something like this: ANNC The first "N" is independent of the second "N". For example, it can match with: AAAC AACC AAGC AATC .... .... ATTC 16 possibilities. The ambiguous symbols are independent of each other. But in some restriction enzyme, the ambiguous symbols are dependent of each other. So for a sequence like ANNC Would than only match with: AAAC ACCC AGGC ATTC Only 4 possibilities. The ambiguous symbols are dependent of each other. This happens with these enzymes: TaqII M.PhiBssHII (unknown cutlocation) M.Phi3TI (unknown cutlocation) M.Rho11sI (unknown cutlocation) M.SPBetaI (unknown cutlocation) M.SPRI (unknown cutlocation) <1>TaqII <2> <3>GACCGA(11/9),CACCCA(11/9) <4> <5>Thermus aquaticus YTI <6>J.I. Harris <7>X <8>Barker, D., Hoff, M., Oliphant, A., White, R., (1984) Nucleic Acids Res., vol. 12, pp. 5567-5581. Myers, P.A., Roberts, R.J., Unpublished observations. Rutkowska, S.M., Jaworowska, I., Skowron, P.M., Unpublished observations. RestrictionEnzymeManager takes the last recognition site in this example, it skips GACCGA. Name: TaqII RecognitionSite:caccca ForwardRegex: cac{3}a ReverseRegex: tg{3}tg CutType: 0 DownStreamEndType: 0 IsPalindromic: false DownstreamCut: 17, 15, - Jesse -----Oorspronkelijk bericht----- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Jesse Verzonden: woensdag 22 juni 2005 12:09 Aan: biojava-l@biojava.org Onderwerp: RE: [Biojava-l] RestrictionEnzymeManager can't correctlyhandle incomplete enzymes (I'm not an expert on restriction enzymes.) I was talking about AacI, of which BamHI is an isoschizomer. The recognition site of AacI is unknown, but the one from BamHI is known. Maybe RestrictionEnzymeManager uses the cutlocation of BamHI when asking the unknown cutlocation of AacI. http://rebase.neb.com/rebase/enz/AacI.html That might also be the reason why RestrictionEnzymeManager requires links between restriction enzymes. If a restriction enzyme entry is removed from the REBASE file RestrictionEnzymeManager fails to read in some cases. But I think using cutlocation of isoschizomers is wrong. Because of this: REBASE says: "A isoschizomers is a restriction enzymes that recognize the same DNA sequence. The cut sites may or may not be identical." So the cut site might be different between different isoschizomers. I searched for examples in the REBASE file, and found them: <1>BspKT6I <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstMBI, BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1786 I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,CcoP 31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338I,C paI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,FnuAII ,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I,Ls p1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,Mk rAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI,N deII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,PfaI ,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI, SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu247 9I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I,R 2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I, TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Uba 1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GAT^C <4>2(6) <5>Bacillus species KT6 <6>N.I. Matvienko <7> <8>Shapovalova, N.I., Zheleznaja, L.A., Matvienko, N.I., (1993) Nucleic Acids Res., vol. 21, pp. 5794. Shapovalova, N.I., Zheleznaya, L.A., Matvienko, N.I., (1994) Biokhimiia, vol. 59, pp. 1730-1738. <1>MboI <2>AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscFI,Bsm XII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,Bsp60 I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105I,Bsp 122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI,BspJ 64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstM BI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1 786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,C coP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338 I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,Fnu AII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I ,Lsp1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII ,MkrAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciA I,NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,P faI,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,Sau EI,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu 2479I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074 I,R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth36 8I,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I, Uba1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>^GATC <4>2(6) <5>Moraxella bovis <6>ATCC 10900 <7>ACFGKNQRUVX <8>Anton, B.P., Brooks, J.E., Unpublished observations. Gelinas, R.E., Myers, P.A., Roberts, R.J., (1977) J. Mol. Biol., vol. 114, pp. 169-179. Huang, L.-H., Farnet, C.M., Ehrlich, K.C., Ehrlich, M., (1982) Nucleic Acids Res., vol. 10, pp. 1579-1591. Ueno, T., Ito, H., Kimizuka, F., Kotani, H., Nakajima, K., (1993) Nucleic Acids Res., vol. 21, pp. 2309-2313. Ueno, T., Ito, H., Kotani, H., Nakajima, K., Japanese Patent Office, 1993. <1>Mel3JI <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI ,BstMBI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I ,Bth1786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,C acI,CcoP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,Cj eP338I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHC I,FnuAII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,Ll aKR2I,Lsp1109II,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,M krAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI, NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,Pfa I,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI ,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu24 79I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I, R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I ,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Ub a1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GATC <4> <5>Megasphaera elsedenii 3J <6>P. Pristas <7> <8>Piknova, M., Filova, M., Javorsky, P., Pristas, P., (2004) FEMS Microbiol. Lett., vol. 236, pp. 91-95. Piknova, M., Pristas, P., Javorsky, P., (2004) Folia Microbiol. (Praha), vol. 49, pp. 191-193. -----Oorspronkelijk bericht----- Van: [EMAIL PROTECTED] [ <mailto:[EMAIL PROTECTED]> mailto:[EMAIL PROTECTED] Verzonden: woensdag 22 juni 2005 11:25 Aan: Jesse CC: biojava-l@biojava.org; [EMAIL PROTECTED] Onderwerp: Re: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes I take your point but I notice that BamHI is an isoscizomer. Is the cleavage site of BamHI really unknown?? - Mark "Jesse" <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 06/22/2005 04:15 PM To: <biojava-l@biojava.org> cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes RestrictionEnzymeManager can't correctly handle incomplete enzymes and gives wrong data. (Correct me if I'm wrong.) I'm not sure if this is already discussed or not. I think RestrictionEnzymeManager can not handle incomplete restriction enzymes. BioJava 1.4Pre2 knows two types of RestrictionEnzymes: -RestrictionEnzyme.CUT_SIMPLE -RestrictionEnzyme.CUT_COMPOUND But in REBASE, there are also other restriction enzyme entries: -Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager skips this one (which is ok). -Unknown cut location. For example AacI "<3>GGATCC". The problem with RestrictionEnzymeManager is with those REBASE entries which have an unknown cutlocation. RestrictionEnzymeManager will actually tell that there is a cutlocation, even though it's unknown in the REBASE file. For example: http://rebase.neb.com/rebase/link_withrefm --------- REBASE ENTRY ----------- <1>AacI <2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII, Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I ,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2 464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I, Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I, Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I ,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba 1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I <3>GGATCC <4> <5>Acetobacter aceti sub. liquefaciens <6>IFO 12388 <7> <8>Seurinck, J., van Montagu, M., Unpublished observations. ---------------------------------- --------- RestrictionEnzyme values -------- Name: AacI RecognitionSite:ggatcc ForwardRegex: g{2}atc{2} ReverseRegex: g{2}atc{2} CutType: 0 (RestrictionEnzyme.CUT_SIMPLE) DownStreamEndType: 2 IsPalindromic: true DownstreamCut: 1, 1, ------------------------------------------- As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a cutlocation while the REBASE entry says that the cutlocation is unknown, only the recognition site is known. So RestrictionEnzymeManager should also filter out those with an unknown cutlocation, otherwise it gives wrong data. - Jesse [Biojava-l] RestrictionEnzymeManager REBASE reader bug? mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21 22:22:52 EDT 2005 Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l