SMILES ------> ClC(Cl)(Cl)C(O)OCC(COC(O)C(Cl)(Cl)Cl)(COC(O)C(Cl)(Cl)Cl)COC(O)C(Cl)(Cl)Cl SMARTS ------> ClC(Cl)(Cl)[CH]([O-,OH,OC])O[CH2]C([CH2]O[CH]([O-,OH,OC])C(Cl)(Cl)Cl)([CH2]O[CH]([O-,OH,OC])C(Cl)(Cl)Cl)[CH2]O[CH]([O-,OH,OC])C(Cl)(Cl)Cl
I tried this example (SMILES) against itself using SMSD and I got the answer in less than ~2 seconds. It very difficult to find "a single" best MCS software. Each MCS algorithm comes with some pros and cos. Some are good for finding all possible cliques (which increases the runtime) and others are good in finding subgraphs (usually you need only one solution). For example, as far as I can see UIT and SMSD (which also uses modified UIT in few cases) belongs to former class and VF2 and Ullman belongs to later class. I have tried to use adaptive MCS in SMSD but I guess we can join hands to make MCS based solution more effective. Thanks Asad > > ------------------------------ > > Message: 3 > Date: Thu, 30 Apr 2009 15:22:55 -0500 > From: Loren Lenzen <loren.len...@sial.com> > Subject: [Cdk-user] Possible bug in SQT > To: cdk-user@lists.sourceforge.net > Message-ID: > <ofebcdec80.18caf729-on862575a8.006d4377-862575a8.006fd...@sial.com> > Content-Type: text/plain; charset="us-ascii" > > I have a list of parent molecules in SMILES form, and I was running the > SQT against a list of SMARTS queries, to make sure that all my queries > were valid. It works great until it hits Petrichloral. The SMILES and > SMARTS strings parse fine, but there is no more output when the SQT runs > into this query (currently 264/326), so I believe there might be a > possible recursivity problem. There was no CDKException thrown even after > an hour, even though the first 263 queries ran in 15 seconds. Petrichloral > is very symmetric, and daylight's depict.cgi runs the query fine with > 31104 matches: apparently (4*3*2)(3*2)^4. I was purposely running a dot > product iteration here so thet's why there is no inner loop. I tried > formatting the SMARTS with and without brackets. > > SMILES ------> > ClC(Cl)(Cl)C(O)OCC(COC(O)C(Cl)(Cl)Cl)(COC(O)C(Cl)(Cl)Cl)COC(O)C(Cl)(Cl)Cl > SMARTS ------> > ClC(Cl)(Cl)[CH]([O-,OH,OC])O[CH2]C([CH2]O[CH]([O-,OH,OC])C(Cl)(Cl)Cl)([CH2]O[CH]([O-,OH,OC])C(Cl)(Cl)Cl)[CH2]O[CH]([O-,OH,OC])C(Cl)(Cl)Cl > > > > public static void main(String[] args) throws CDKException, > FileNotFoundException, IOException { > > SmilesParser sp=new > SmilesParser(DefaultChemObjectBuilder.getInstance()); > ArrayList<String> smarts=new ArrayList(); > ArrayList<String> mols=new ArrayList(); > String smart=new String(); > String mol=new String(); > AtomContainerSet acs=new AtomContainerSet(); > > BufferedReader br1=new BufferedReader(new > FileReader("smarts.txt")); > while ((smart=br1.readLine()) != null){ > smarts.add(smart); > } > br1.close(); > > BufferedReader br2=new BufferedReader(new > FileReader("subStructures.txt")); > while ((mol=br2.readLine()) != null){ > mols.add(mol); > acs.addAtomContainer(sp.parseSmiles(mol)); > } > br2.close(); > > BufferedWriter stream= new BufferedWriter(new > FileWriter("deaOut.txt", true)); > SMARTSQueryTool sqt=new SMARTSQueryTool("c1ccccc1"); //dummy > string for initialization > for (int ac=0; ac != acs.getAtomContainerCount(); ac++){ > sqt.setSmarts(smarts.get(ac)); > System.out.println(""+ac); //for debugging purposes > try { > if (sqt.matches(acs.getAtomContainer(ac))){ > stream.write(mols.get(ac) + " | " + smarts.get(ac)); > stream.newLine(); > } > } > catch (CDKException ex){throw new > CDKException(ex.toString());} > } > stream.close(); > } > } > > This message and any files transmitted with it are the property of > Sigma-Aldrich Corporation, are confidential, and are intended > solely for the use of the person or entity to whom this e-mail is > addressed. If you are not one of the named recipient(s) or > otherwise have reason to believe that you have received this > message in error, please contact the sender and delete this message > immediately from your computer. Any other use, retention, > dissemination, forwarding, printing, or copying of this e-mail is > strictly prohibited. > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Thu, 30 Apr 2009 16:51:58 -0400 > From: Rajarshi Guha <rg...@indiana.edu> > Subject: Re: [Cdk-user] Possible bug in SQT > To: Loren Lenzen <loren.len...@sial.com> > Cc: cdk-user@lists.sourceforge.net > Message-ID: <05778aef-b22e-4f3b-99ea-20906e88a...@indiana.edu> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > > On Apr 30, 2009, at 4:22 PM, Loren Lenzen wrote: > > >> I have a list of parent molecules in SMILES form, and I was running >> the SQT against a list of SMARTS queries, to make sure that all my >> queries were valid. It works great until it hits Petrichloral. The >> SMILES and SMARTS strings parse fine, but there is no more output >> when the SQT runs into this query (currently 264/326), so I believe >> there might be a possible recursivity problem. There was no >> CDKException thrown even after an hour, even though the first 263 >> queries ran in 15 seconds. >> > > The problem is in the isomorphism code it seems. If one ignores the > SMARTS, and just tries to match the SMILES against itself, UIT, VF2 > and Ullman all run forever (or at least 30 seconds, after which I > stopped the run). Some symmetry based optimization seems to be called > for here > > ------------------------------------------------------------------- > Rajarshi Guha <rg...@indiana.edu> > GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 > ------------------------------------------------------------------- > Q: What's polite and works for the phone company? > A: A deferential operator. > > > > > > ------------------------------ > > ------------------------------------------------------------------------------ > Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > ------------------------------ > > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user > > > End of Cdk-user Digest, Vol 36, Issue 1 > *************************************** > -- **************************************************************** Dr. Syed Asad Rahman (B.Engg, PhD) Research Scientist EMBL-EBI Phone: +44-(0) 1223-49-2537 Wellcome Trust Genome Campus Fax: +44-(0) 1223-49-4486 Hinxton CB10 1SD E-mail: a...@ebi.ac.uk Cambridge, UK Home Page: www.ebi.ac.uk/~asad ***************************************************************** ------------------------------------------------------------------------------ Register Now & Save for Velocity, the Web Performance & Operations Conference from O'Reilly Media. Velocity features a full day of expert-led, hands-on workshops and two days of sessions from industry leaders in dedicated Performance & Operations tracks. Use code vel09scf and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user