On 13 December 2010 21:35, Thomas Strunz <beginn...@hotmail.de> wrote:
>
> Hi all,
>
>
>
> *Fortunately, the CDK code that reads MOL files adds atoms and bonds in
> the same order, as in the MOL file, otherwise, it would be trickier.*
>
> Yeah I looked at the MDLV2000Reader Source code and if it does not change
> that should be fairly easy to achieve.
>
>
> Of course my next thought was why not store all atoms and bonds and the
> relevent properties? So that you can just create the atomcontainer by
> setBonds and setAtoms.
>
>
> Because that would take up a lot of space? Hard to tell I'm not so familiar
> (yet?) with CDKsource code and what properties atoms and bonds have that are
> actually relevant for fingerprinting and subgraph matching.
>
atom types and aromaticity flags at first place.
> Also it's kind of hard to actually see what properties/flags are available
> (set and get Flags, CDKConstants).
> But anyway what I'm trying to suggest or ask or what poped into my mind is
> why not use hibernate (or something similar; an idea which is of course
> contradicting to my previous comment about storing all aromatic atoms as
> being stupid)? Ok, I'm not very familiar with either (cdk or hibernate, like
> how do you add an id for hibernate to an existing class?) and cdk object
> hierarchy in my unexperiencied eyes is rather complex and maybe not ideal
> for hibernate this might be a ridicouls idea. of course creating mapping
> file would be a rather tedious and annoying task but you could clearly
> specifiy which information you actually want to store. Ok, there would be a
> lot more rows and columns in the database but each field will contain a lot
> less data compared to having varchar/clob field for molfiles. Maybe it would
> not take that much more storage space than having molfiles and probably
> would perfrom better especially compared to clob-columns.
>
> end of brainstorming,
>
>
Storing atoms and bonds separately is a valid option, as well as using
hibernate. What is important is not really the amount of storage , but how
much faster or slower it is to read atoms and bonds from different columns
and rows, rather than single blob field for mol file. I could imagine
reading many columns and rows and combiningis slower, but haven't seen a
benchmark, would be interesting if there is one.
Nina
> Thomas
>
>
>
> > From: j.kerssemak...@cmbi.ru.nl
> > Date: Mon, 13 Dec 2010 15:45:19 +0100
>
> > Subject: Re: [Cdk-user] Substructure Searching, Fingerprints and
> cdk-1.3.7 Isomorphism Class
> > To: jeliazkova.n...@gmail.com
> > CC: beginn...@hotmail.de; cdk-user@lists.sourceforge.net
>
> >
> > Just a short note to mention that I'm closely following this topic. A
> > major rewrite of our own database system is somewhere in the near
> > future, so this is good reading! Thanks for sharing!
> >
> > ~Jules Kerssemakers
> >
> > On 13 December 2010 08:51, Nina Jeliazkova <jeliazkova.n...@gmail.com>
> wrote:
> > > Hi Thomas,
> > >
> > > On 10 December 2010 20:04, Thomas Strunz <beginn...@hotmail.de> wrote:
> > >>
> > >> Sorry for calling you stupid. ;)
> > >>
> > >
> > > ;)
> > >
> > >>
> > >> I just meant if you have like 100'000 Molecules and assuming 25 % are
> > >> aromatic probably mostly benzene rings = 6 molecules + bonds that
> leads to
> > >> 12* 25'000 = 300'000 records. Ok that's manageable since it's only an
> ID and
> > >> a bit. But depends mostly on the dataset. My focus is on smaller
> molecules.
> > >> Probably also the reason by graph matching does not seem to be that
> big of a
> > >> problem.
> > >
> > > just single field with all the additional info for atoms and bonds.
> Not
> > > pretending this is the best way, just a simple one.
> > >>
> > >> How do you Map a certain Atom or Bond form the Database to the right
> one
> > >> in the AtomContainer created from Molfile?
> > >> Does Atom class also have an id like molecule class? Then it would not
> be
> > >> that difficult.
> > >>
> > >
> > > Fortunately, the CDK code that reads MOL files adds atoms and bonds in
> the
> > > same order, as in the MOL file, otherwise, it would be trickier.
> > > Regards,
> > > Nina
> > >>
> > >> have a nice weekend
> > >>
> > >> Regards,
> > >>
> > >> Thomas
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
>
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user