Hi Nina, Can I know which algorithm did you use in the SMSD for substructure search?
Hi Thomas, Regarding your observations. Yes there is an overhead involved in this processes and this has to do with a code where SMSD is used. You can skip this overhead by setting the flags to false. I will submit a clear patch. Thanks Asad On 1 Mar 2011, at 08:58, cdk-user-requ...@lists.sourceforge.net wrote: > Send Cdk-user mailing list submissions to > cdk-user@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/cdk-user > or, via email, send a message with subject or body 'help' to > cdk-user-requ...@lists.sourceforge.net > > You can reach the person managing the list at > cdk-user-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Cdk-user digest..." > > > Today's Topics: > > 1. Isomorhism and MolHandler: removeHydrogen flag issue/bug > (Thomas Strunz) > 2. Re: Isomorhism and MolHandler: removeHydrogen flag issue/bug > (Nina Jeliazkova) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 1 Mar 2011 09:49:07 +0100 > From: Thomas Strunz <beginn...@hotmail.de> > Subject: [Cdk-user] Isomorhism and MolHandler: removeHydrogen flag > issue/bug > To: <cdk-user@lists.sourceforge.net> > Message-ID: <dub103-w46f18efa24ed07f6ca4c3eee...@phx.gbl> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi all, > > the Isomorphism class has an init method: > > public void init(IMolecule reactant, IMolecule product, boolean > removeHydrogen, boolean cleanAndConfigureMolecule) throws CDKException { > this.removeHydrogen = removeHydrogen; > init(new MolHandler(reactant, removeHydrogen, > cleanAndConfigureMolecule), > new MolHandler(product, removeHydrogen, > cleanAndConfigureMolecule)); > } > > The molecules I pass into this method have no explicit hydrogens and are > configured so both flags removeHydrogen and cleanAndConfigureMolecule I > should be able to set them to false. > The issue is the removeHydrogen flag. If I set it to "false" it cripples > performance compared to "true". However even with flag set to true UIT is > faster! > > MolHandler Constructor: > > public MolHandler(IAtomContainer container, boolean removeHydrogen, > boolean cleanMolecule) { > String molID = container.getID(); > this.removeHydrogen = removeHydrogen; > this.atomContainer = container; > if (removeHydrogen) { > try { > this.atomContainer = > ExtAtomContainerManipulator.removeHydrogensExceptSingleAndPreserveAtomID(atomContainer); > <- remove Hydrogen set to true > } catch (Exception ex) { > logger.error(ex); > } > } else { > this.atomContainer = > container.getBuilder().newInstance(IAtomContainer.class, atomContainer); <- > remove Hydrogen set to false. this is pointless IMHO. do nothing. > } > > if (cleanMolecule) { > try { > if (!isPseudoAtoms()) { > atomContainer = > canonLabeler.getCanonicalMolecule(atomContainer); > } > // percieve atoms, set valency etc > > ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(atomContainer); > //Add implicit Hydrogens > CDKHydrogenAdder adder = > CDKHydrogenAdder.getInstance(atomContainer.getBuilder()); > adder.addImplicitHydrogens(atomContainer); > // figure out which atoms are in aromatic rings: > CDKHueckelAromaticityDetector.detectAromaticity(atomContainer); > } catch (CDKException ex) { > logger.error(ex); > } > } > atomContainer.setID(molID); > } > > I tried to determine what actually is done in both code-path. > > Setting removeHydrogen to "true": > Bascially clones all atoms (except H) and all bonds (except does that were > connected to H) into a new Molecule and sets implicit hydrogen. In my case > this is just a waste of CPU time. > > Setting removeHydrogen to "false": > Executes following line of code: > this.atomContainer = container.getBuilder().newInstance(IAtomContainer.class, > atomContainer); > > What is the point of this? IMHO it's 100% pointless and a waste of CPU time. > I'm not sure why this cripples performance because what it ends up doing is > this: > > public AtomContainer(IAtomContainer container) > { > this.atomCount = container.getAtomCount(); > this.bondCount = container.getBondCount(); > this.lonePairCount = container.getLonePairCount(); > this.singleElectronCount = container.getSingleElectronCount(); > this.atoms = new IAtom[this.atomCount]; > this.bonds = new IBond[this.bondCount]; > this.lonePairs = new ILonePair[this.lonePairCount]; > this.singleElectrons = new ISingleElectron[this.singleElectronCount]; > > stereoElements = new ArrayList<IStereoElement>(atomCount/2); > > for (int f = 0; f < container.getAtomCount(); f++) { > atoms[f] = container.getAtom(f); > container.getAtom(f).addListener(this); > } > for (int f = 0; f < this.bondCount; f++) { > bonds[f] = container.getBond(f); > container.getBond(f).addListener(this); > } > for (int f = 0; f < this.lonePairCount; f++) { > lonePairs[f] = container.getLonePair(f); > container.getLonePair(f).addListener(this); > } > for (int f = 0; f < this.singleElectronCount; f++) { > singleElectrons[f] = container.getSingleElectron(f); > container.getSingleElectron(f).addListener(this); > } > } > > So it also copies the whole Molecule into a new AtomContainer. Not sure why > this is so much slower but it is besides being pointless. The number of hits > found is identical to setting removeHydrogens to true or using UIT. > I'm not sure why everyone says UIT is much slower. It is theoretically but in > my case it is not probably because of the useless work done as indicated > above. > > Any comments? Am I missing something? > > Regards, > > Thomas > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Tue, 1 Mar 2011 10:58:34 +0200 > From: Nina Jeliazkova <jeliazkova.n...@gmail.com> > Subject: Re: [Cdk-user] Isomorhism and MolHandler: removeHydrogen flag > issue/bug > To: Thomas Strunz <beginn...@hotmail.de> > Cc: cdk-user@lists.sourceforge.net > Message-ID: > <AANLkTimN_5p6UYRMNcRtQY6HbHgAdNMexWHu74E=m...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On 1 March 2011 10:49, Thomas Strunz <beginn...@hotmail.de> wrote: > >> Hi all, >> >> the Isomorphism class has an init method: >> >> public void init(IMolecule reactant, IMolecule product, boolean >> removeHydrogen, boolean cleanAndConfigureMolecule) throws CDKException { >> this.removeHydrogen = removeHydrogen; >> init(new MolHandler(reactant, removeHydrogen, >> cleanAndConfigureMolecule), >> new MolHandler(product, removeHydrogen, >> cleanAndConfigureMolecule)); >> } >> >> The molecules I pass into this method have no explicit hydrogens and are >> configured so both flags removeHydrogen and cleanAndConfigureMolecule I >> should be able to set them to false. >> The issue is the removeHydrogen flag. If I set it to "false" it cripples >> performance compared to "true". However even with flag set to true UIT is >> faster! >> >> MolHandler Constructor: >> >> public MolHandler(IAtomContainer container, boolean removeHydrogen, >> boolean cleanMolecule) { >> String molID = container.getID(); >> this.removeHydrogen = removeHydrogen; >> this.atomContainer = container; >> if (removeHydrogen) { >> try { >> this.atomContainer = >> ExtAtomContainerManipulator.removeHydrogensExceptSingleAndPreserveAtomID(atomContainer); >> <- remove Hydrogen set to true >> } catch (Exception ex) { >> logger.error(ex); >> } >> } else { >> this.atomContainer = >> container.getBuilder().newInstance(IAtomContainer.class, atomContainer); <- >> remove Hydrogen set to false. this is pointless IMHO. do nothing. >> } >> >> if (cleanMolecule) { >> try { >> if (!isPseudoAtoms()) { >> atomContainer = >> canonLabeler.getCanonicalMolecule(atomContainer); >> } >> // percieve atoms, set valency etc >> >> ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(atomContainer); >> //Add implicit Hydrogens >> CDKHydrogenAdder adder = >> CDKHydrogenAdder.getInstance(atomContainer.getBuilder()); >> adder.addImplicitHydrogens(atomContainer); >> // figure out which atoms are in aromatic rings: >> >> CDKHueckelAromaticityDetector.detectAromaticity(atomContainer); >> } catch (CDKException ex) { >> logger.error(ex); >> } >> } >> atomContainer.setID(molID); >> } >> >> I tried to determine what actually is done in both code-path. >> >> Setting removeHydrogen to "true": >> Bascially clones all atoms (except H) and all bonds (except does that were >> connected to H) into a new Molecule and sets implicit hydrogen. In my case >> this is just a waste of CPU time. >> >> Setting removeHydrogen to "false": >> Executes following line of code: >> this.atomContainer = >> container.getBuilder().newInstance(IAtomContainer.class, atomContainer); >> >> What is the point of this? IMHO it's 100% pointless and a waste of CPU >> time. I'm not sure why this cripples performance because what it ends up >> doing is this: >> >> public AtomContainer(IAtomContainer container) >> { >> this.atomCount = container.getAtomCount(); >> this.bondCount = container.getBondCount(); >> this.lonePairCount = container.getLonePairCount(); >> this.singleElectronCount = container.getSingleElectronCount(); >> this.atoms = new IAtom[this.atomCount]; >> this.bonds = new IBond[this.bondCount]; >> this.lonePairs = new ILonePair[this.lonePairCount]; >> this.singleElectrons = new >> ISingleElectron[this.singleElectronCount]; >> >> stereoElements = new ArrayList<IStereoElement>(atomCount/2); >> >> for (int f = 0; f < container.getAtomCount(); f++) { >> atoms[f] = container.getAtom(f); >> container.getAtom(f).addListener(this); >> } >> for (int f = 0; f < this.bondCount; f++) { >> bonds[f] = container.getBond(f); >> container.getBond(f).addListener(this); >> } >> for (int f = 0; f < this.lonePairCount; f++) { >> lonePairs[f] = container.getLonePair(f); >> container.getLonePair(f).addListener(this); >> } >> for (int f = 0; f < this.singleElectronCount; f++) { >> singleElectrons[f] = container.getSingleElectron(f); >> container.getSingleElectron(f).addListener(this); >> } >> } >> >> So it also copies the whole Molecule into a new AtomContainer. Not sure why >> this is so much slower but it is besides being pointless. The number of hits >> found is identical to setting removeHydrogens to true or using UIT. >> I'm not sure why everyone says UIT is much slower. It is theoretically but >> in my case it is not probably because of the useless work done as indicated >> above. >> > > To share a bit of our recent benchmarking experience, we actually found CDK > UIT is faster than SMSD for substructure searching (haven't tested MCSS). > > Nina > > >> >> Any comments? Am I missing something? >> >> Regards, >> >> Thomas >> >> >> >> >> ------------------------------------------------------------------------------ >> Free Software Download: Index, Search & Analyze Logs and other IT data in >> Real-Time with Splunk. Collect, index and harness all the fast moving IT >> data >> generated by your applications, servers and devices whether physical, >> virtual >> or in the cloud. Deliver compliance at lower cost and gain new business >> insights. http://p.sf.net/sfu/splunk-dev2dev >> _______________________________________________ >> Cdk-user mailing list >> Cdk-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search & Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > > ------------------------------ > > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user > > > End of Cdk-user Digest, Vol 58, Issue 1 > *************************************** ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user