If I understood Greg correctly, it will be in 2016.09 which isn't in conda just of yet, they are currently working on putting it there.
Markus ------------------------------------- | Markus Sitzmann | markus.sitzm...@gmail.com > On 23 Nov 2016, at 15:29, Alexander Klenner-Bajaja <aklen...@epo.org> wrote: > > Dear Greg, > > Thank you very much, looking at the results that function was exactly what I > was looking for – only I can’t find it in my updated anaconda installation. > > “conda update rdkit” tells me I have the latest version 2016.03.4 and > postgres tells me I have the 3.4 version of the RDKit extension > > If I understand your blog post correctly it should be in 2016.03 version? > What am I missing? > > > Best, > > Alex > > > > From: Greg Landrum [mailto:greg.land...@gmail.com] > Sent: Wednesday, November 23, 2016 11:42 AM > To: Alexander Klenner-Bajaja > Cc: rdkit-discuss@lists.sourceforge.net > Subject: Re: [Rdkit-discuss] smarts vs smiles database queries and explicit > hydrogens > > Hi Alex, > > The new version of the cartridge has some capabilities that, I think, address > this. > > There's a blog post about this: > http://rdkit.blogspot.com/2016/07/tuning-substructure-queries-ii.html > but the short version is that you can do the kind of queries it seems like > you want to do quite simply: > > chembl_21=# select * from rdk.mols where > m@>mol_adjust_query_properties('*c1ncccn1') limit 3; > molregno | m > > ----------+--------------------------------------------------------------------------------------- > 601707 | CCCc1nc(-c2ccc(F)cc2)oc1C(=O)NC(CC)CN1CCN(c2ncccn2)CC1 > 289103 | CC1C(=N)/C(=N/Nc2ccc(S(=O)(=O)Nc3ncccn3)cc2)C(=O)C(C)C1=O > 607646 | > CCNC(=O)[C@@H]1OC(n2cnc3c(NC(=O)Nc4ccc(S(=O)(=O)Nc5ncccn5)cc4)ncnc32)[C@@H](O)[C@H]1O > (3 rows) > > chembl_21=# select * from rdk.mols where > m@>mol_adjust_query_properties('*c1nc(*)ccn1') limit 3; > molregno | m > ----------+------------------------------------------------------- > 158659 | CCNc1nccc(-c2c(-c3ccc(F)cc3)ncn2C2CCN(C)CC2)n1 > 158743 | Nc1nccc(-c2c(-c3ccc(F)cc3)ncn2C2CCN(Cc3ccccc3)CC2)n1 > 158843 | CC1(C)CC(n2cnc(-c3ccc(F)cc3)c2-c2ccnc(N)n2)CC(C)(C)N1 > (3 rows) > > chembl_21=# select * from rdk.mols where > m@>mol_adjust_query_properties('*c1nc(*)cc(*)n1') limit 3; > molregno | m > > ----------+-------------------------------------------------------------------------- > 726443 | CN=C(S)NNc1nc(C)cc(C)n1 > 561136 | > C[C@H](Nc1cc(NC2CCCCCC2)nc(C(F)(F)F)n1)[C@@H](Cc1ccc(Cl)cc1)c1cccc(Br)c1 > 205784 | CCN(CC)C(=O)CSc1nc(N)cc(Cl)n1 > (3 rows) > > There's more detail in the blog post, but the default behavior is to convert > dummies into generic query atoms and to constrain the substitution at any > other *ring* position. > > Best Regards, > -greg > > > On Wed, Nov 23, 2016 at 9:20 AM, Alexander Klenner-Bajaja <aklen...@epo.org> > wrote: > Hi all, > > I am currently exploring the possibilities of the RDKit database cartridge > for substructure search- I installed everything following the tutorial from > http://www.rdkit.org/docs/Install.html > > Very nice tutorial - worked perfectly fine. > > Since we are exploring solutions for browser based gui searches I created a > test page using Ketcher (http://lifescience.opensource.epam.com/ketcher/) > which communicates with the database through PHP. > > Ketcher returns a SMILES representation from the drawn molecule. The raw data > of the molecules in the database are canonical SMILES created from RDKIT > canonical SMILES from the rdkit KNIME node (they are text-mined from patents). > > When doing substructure searches, as long as we query for well-defined > compounds the results make sense – however looking at R1,…-groups things get > a little odd. > > I found a very old discussion on the mailing list from 2009 where this has > been discussed and I understood from that dialog that when looking at SMILES > with a “*” representation this is interpreted as a dummy atom and the same > dummy atom is expected in the search space to produce a hit. While a SMARTS > representation of the same string actually leads to the behaviour that “any > atom” is matched at that position. > > I ended up with the very cumbersome query, I am sure there are more elegant > ways of doing this using ::qmol notation, but as I said I am currently > exploring J > > That’s the query (in PHP) in question for PostgreSQL: > > $search_result = pg_query($dbconn, "select m from pat.mols where > m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('".$_POST['smiles']. "'))) > LIMIT 20;"); > > Extracting rdkit functionality leaves me with: > > m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('".$_POST['smiles']. "'))) > and adding a smiles string to make it more readable: > > m@>mol_from_smarts(mol_to_smiles(mol_from_smiles(' C([*])1=CC=CC=C1'))) > (This is how Ketcher creates the smiles string, using explicit double bonds) > > This query does actually work and returns structures that are correct > (visually inspected a few examples) > > The same query without all the molecule conversion methods does not return > anything > > m@>' C([*])1=CC=CC=C1' > > I guess the reason for this is that the default interpretation is smiles and > it is looking for actual dummy atoms in the database (there are none). > > That’s my first question: Is this assumption correct? > > My next issue is a query with explicit hydrogens: > > Using > > “C([*])1=C([H])C([H])=C([H])C([H])=C1[H]” > > as a query with the all the molecule conversion as shown above to make SMARTS > happen, returns among others: > > “C(C)1=CC=C(C)C=C1” > > Which is correct for implicit hydrogens but not for explicit – so my guess is > they are lost. > > Can I enforce at query time against the cartridge to work with explicit > hydrogens so that only molecules are found that have different substitutes at > the “*” position? > > I could not find a pre-defined function for that. > > Thank you very much for any hints or solutions, > > Best regards, > > Alex > > > > Best regards / Mit freundlichen Grüßen / Sincères salutations > > Dr. Alexander Garvin Klenner-Bajaja > Administrator Requirements Engineering-Solution Design | Dir. 2.8.3.3 > European Patent Office > Patentlaan 3-9 | 2288 EE Rijswijk | The Netherlands > Tel. +31(0)70340-1991 > aklen...@epo.org > www.epo.org > > Please consider the environment before printing this email. > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > ------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss