Excellent, thanks Richard and Deepak! Andreas On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland <[email protected]>wrote:
> Patched and in subversion on the head in the new Biojava 3 code. I modified > the code slightly to simplify it. There were also parallel changes required > over in SimpleDocRef itself to enable it to continue working without being > connected to BioSQL. > > On 25 Mar 2010, at 01:19, Deepak Sheoran wrote: > > > I am writing this email again, I didn't get any response weather this > bugs are patched or are they lost some where on mailing list. I am not sure > that's why I am writing this back. I don't know how to apply this patch So I > am counting on you guys to apply theses patch and reply me back so I know > its fixed. > > > > > > > > Thanks > > Deepak Sheoran > > > > > > Hi > > In response to bug fix suggested by Richard I have created some patches. > We need to apply these to fix biojava from processing references from a > genbank record in a wrong manner which cause more hibernate exceptions. > After applying patch, reference resolution code will test pubmed or medline > id, then if no match then test author/title/location, then if still no match > create a new reference. I even tested it with GenbankRelease 175 and I > gained almost 3159 more records in my database. > > > > Can somebody please have a look on second issue of it and fix it > > " > > 2. I think that's a bug (compound locations with null features) but not > sure why. Could be that the process of constructing a CompoundRichLocation > is somehow losing the feature reference from the original > SimpleRichLocation. Again I can't investigate until March - can someone else > take a look at the code? (A good starting point would be to look at how a > CompoundRichLocation decides to select the feature from the > SimpleRichLocations it is made up from). > > " > > > > Also I am planning on making a bridge between biosql database loaded > using bioperl and biojava, here is my some of the investigation can you guys > suggest some direction on it. > > Have a look on attached files > > 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank > record is stored in biosql instance by bioperl and biojava > > 2) GenbankRecord.doc ==> its word document having a genbank showing > where its information goes in biosql using bioperl and biojava > > 3) BioSqlRichobjectBuilder.patch ==> patch needed for > BioSqlRichObjectBuild.java class > > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class > > > > > > Thanks > > Deepak Sheoran > > > > > > > > -------- Original Message -------- > > Subject: Re: Hibernate Exception and suggestion for change in > BioSqlSchema > > Date: Tue, 9 Feb 2010 20:34:32 +1300 > > From: Richard Holland <[email protected]> > > To: Deepak Sheoran <[email protected]> > > CC: [email protected] > > > > Hi. It's possible that your original email didn't make it to the list > because it is HTML format, and the list only accepts plain text. > > > > However, in answer to your two questions: > > > > 1. The code that does the resolution of references might be better if > it looks up existing IDs rather than using author, title, location to > identify existing records. I would suggest modifying it to a three-step > process - test ID, then if no match then test author/title/location, then if > still no match create a new reference. Could someone do that? (I'm unable to > do anything until late March). > > > > 2. I think that's a bug (compound locations with null features) but not > sure why. Could be that the process of constructing a CompoundRichLocation > is somehow losing the feature reference from the original > SimpleRichLocation. Again I can't investigate until March - can someone else > take a look at the code? (A good starting point would be to look at how a > CompoundRichLocation decides to select the feature from the > SimpleRichLocations it is made up from). > > > > cheers, > > Richard > > > > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > > > > > > > Hi Richard > > > > > > Below is the email which I sent to Biojava-1 mailing list but it never > get posted on the mailing list server neither do i got any response, so > please have a look on this email and tell what can be the solution of the > problem described in the message. > > > > > > > > > Thanks > > > Deepak Sheoran > > > -------- Original Message -------- > > > Subject: Hibernate Exception and suggestion for change in > BioSqlSchema > > > Date: Wed, 03 Feb 2010 08:07:35 -0600 > > > From: Deepak Sheoran > > <[email protected]> > > > > > To: > > [email protected] > > > > > > > > Hi guys, > > > > > > A couple of days back I was having some problem with hibernate > exception but that exception got resolved and the reference to that email > is: > > > http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > > > > > On Richard suggestion in above link I am able to resolve some of > issues but then, I got stuck in to some other error with hibernate and then > decided to investigate the matter and below are some facts and information > which I found and I guess it is going to affect all of us. > > > • The "Reference" table in bioSql schema have unique constraint on > "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). > Which mean only one entry in reference table can use on dbxref_id. > > > This Works wells but in cases when you have little variation in value > of following column "location", "title", "authors" and all these variation > refers to same PUBMED_ID. Then we can't persist or create a richsequence > object . > > > Now when you tie RichObjectFactory to a active hibernate session then > the class "BioSqlRichObjectBuilder" have method called "buildObject(Class > clazz, List paramsList) " which is responsible for looking up details of > object in the database and if it find one then it will return that object, > else it will try to persist the new object into the database. > > > But problem is with below part of that method: > > > …..LineNumber: 114 > > > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > > > { queryType = "DocRef"; > > > // convert List constructor to String representation > for query > > > ourParamsList.set(0, > DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > > > if (ourParamsList.size()<3) { > > > queryText = "from DocRef as cr where cr.authors > = ? and cr.location = ? and cr.title is null"; > > > } else { > > > queryText = "from DocRef as cr where cr.authors > = ? and cr.location = ? and cr.title = ?"; > > > } > > > } > > > ..LineNubmer: 123 > > > Now when hibernate search the database, it won't find any other record > in "reference" table because those two record are different in string > comparison, so it will return a new object back to "GenbankFormat" to > following piece of code > > > ….LineNumber: 447 > > > else { > > > try { > > > CrossRef cr = > (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new > Object[]{dbname, raccession, new Integer(0)}); > > > RankedCrossRef rcr = new > SimpleRankedCrossRef(cr, ++rcrossrefCount); > > > > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > > > } catch (ChangeVetoException e) > { > > > throw new > ParseException(e+", accession:"+accession); > > > } > > > } > > > …..LineNumber:455 > > > Then we will add that object to rlistener. And move to next part of > genbank record and then biojava search for a new crossref in database and it > will try to persist the old one it get a hibernate exception regarding > violation of "unique constraint on dbxref_id" column. > > > > > > The only way to get these record in database is: > > > • The very easy solution and the way I did it for testing > my theory is Change the bioSql schema so that it can allow many to one on > relation between "reference" and "dbxref" table. Which even make sense > because one paper can have many different variation of naming, and this > change allow us to store that info too. But this is something BioSql people > have decide and I don't know how to approach them. > > > • Second solution is slightly difficult to implement, is to > change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List > paramsList)" make decision about weather a particular DocRef already exist > in database or not. I am mean testing all possible string variations of > authors, location, title of the docRef which we are searching. Which does > have many complications and may slow down process of creating a richsequence > object when link RichObjectFactory with a active hibernate session. > > > > > > Example:Below is a sample of what i have in my local biosql schema > which has modification suggested by me. (dbxref_id column have Pubmed_id , I > replaced the local dbxref_id which was present on this table in my database > with pubmed_id stored in "dbxref" table, for easy reference with outside > world in this email) > > > Reference_id > > > Dbxref_id > > > Location > > > Title > > > Authors > > > crc > > > 216 > > > 18554304 > > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 > (2008) > > > Isolation of lactate-utilizing butyrate-producing bacteria from human > feces and in vivo administration of Anaerostipes caccae strain L2 and > galacto-oligosaccharides in a rat model > > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., > Nomoto,K., Ito,M. and Sawada,H. > > > 9E940E01F4BE3CD0 > > > 230 > > > 18554304 > > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > > > Isolation of lactate-utilizing butyrate-producing bacteria from human > feces and in vivo administration of Anaerostipes caccae strain L2 and > galacto-oligosaccharides in a rat model > > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., > Nomoto,K., Ito,M. and Sawada,H. > > > D3BC0C17F3F786C9 > > > 415 > > > 16790744 > > > Infect. Immun. 74 (7), 3715-3726 (2006) > > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is > Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via > Recombination with Repetitive Chromosomal Sequences > > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and > Totten,P.A. > > > 60AEDFA0CEEACC38 > > > 969 > > > 16790744 > > > Infect. Immun. 74 (7), 3715-3726 (2006) > > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is > extensive in vitro and in vivo and suggests that variation is generated via > recombination with repetitive chromosomal sequences > > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and > Totten,P.A. > > > 4B1232999F6E8130 > > > 929 > > > 8688087 > > > Science 273 (5278), 1058-1073 (1996) > > > Complete genome sequence of the methanogenic archaeon, Methanococcus > jannaschii > > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., > Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., > Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., > Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., > Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., > Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., > Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., > Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and > Venter,J.C. > > > 3E79B40DD2AAA2B7 > > > 932 > > > 8688087 > > > Science 273 (5278), 1058-1073 (1996) > > > Complete genome sequence of the methanogenic archaeon, Methanococcus > jannaschii > > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., > Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., > Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., > Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., > Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., > Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., > Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., > Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > > 094EB3384F8D6DE8 > > > 1426 > > > 10684935 > > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae > AR39 > > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., > Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., > Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., > Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and > Fraser,C.M. > > > 357648D8FD8C6C8A > > > 1481 > > > 10684935 > > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae > AR39 > > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., > Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., > Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., > DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > > > 115411EB2DEE5654 > > > 1497 > > > 14689165 > > > Arch. Microbiol. 181 (2), 144-154 (2004) > > > The effect of FITA mutations on the symbiotic properties of > Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., > del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. > and Ruiz-Sainz,J.E. > > > 4D5D376EECCD186B > > > 1501 > > > 14689165 > > > Arch. Microbiol. 181 (2), 144-154 (2004) > > > The effect of FITA mutations on the symbiotic properties of > Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., > Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. > and Ruiz-Sainz,J.E. > > > 4D57954EECDED66B > > > 1556 > > > 18060065 > > > PLoS ONE 2 (12), E1271 (2007) > > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 > and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., > Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > > 698688FB6DB95247 > > > 1559 > > > 18060065 > > > PLoS ONE 2 (12), E1271 (2007) > > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 > and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., > Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > > E25E1BA99DB18F3D > > > > > > • The second kind of error which I got was : > org.hibernate.PropertyValueException: not-null property references a null or > transient value: Location.feature > > > • Which means in richsequence object some feature have > location object which have its feature set to null. > > > • My Observation: > > > • Usually occur when you try to persist a > richsequence object to database, and occur to those features which have > CompoundRichLocation usually "joins" and "complement" in cds region of a > genbank record > > > • After catching the hibernate exception I went > through all the features and either biojava or hibernate changed the object > type of a CompoundRichLocation to SimpleRichLocation and set the feature > variable to null. > > > • Below is the screen shot of one of my tests > > > • Settings before trying to persits the > richsequence object to database > > > > > > <Mail Attachment.png> > > > • > > > • After trying to persits the richsequence object to > database and got in hibernate exception catch > > > > > > • <Mail Attachment.png> > > > > > > • So my question is why is this happening and how to stop > or how to get these record into database, I have no clue why is this > happening. > > > • Some extra information to make things more clear to you > guys. > > > • Below are some Locus line from genbank record for > which I know the error of location, I mean the cds region causing error, and > array index in richsequence.feature arrayList object. > > > • LOCUS AE001439 1643831 > bp DNA circular BCT 19-JAN-2006 > > > • richSequence.feature Index : 2540 > and line number in the genbank record : 22115 > > > • LOCUS CP001189 3887492 > bp DNA circular BCT 16-OCT-2008 > > > • richSequence.feature Index : 127 > and line number in the genbank record : 2137 > > > • LOCUS CP001292 328635 > bp DNA circular BCT 17-DEC-2008 > > > • richSequence.feature Index : 389 > and line number in the genbank record : 3632 > > > • LOCUS AM279694 238517 > bp DNA linear BCT 23-OCT-2008 > > > • richSequence.feature Index : 47 > and line number in the genbank record : 4841 > > > • LOCUS CR931663 18517 > bp DNA linear BCT 18-SEP-2008 > > > • richSequence.feature Index : 45 > and line number in the genbank record : 442 > > > • The complete exception msg : > > > org.hibernate.PropertyValueException: not-null property references a > null or transient value: Location.feature > > > at > org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > > at > org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > > at > org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at > org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > > at > org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > > at > org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > > at > org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > > at > org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at > org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > > at > org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > > at > org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > > > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > > > at > trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > > > > > > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: > > [email protected] > > http://www.eaglegenomics.com/ > > > > > > > > > <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: [email protected] > http://www.eaglegenomics.com/ > > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
