[Rdkit-discuss] SA_scores and QED scores questions?
Hello, I created two CPP scoring functions based on your QED and SA_score python code. I tried implementing it as close as possible to get matching scores. Just to see if the code works. I have succeeded, which is exciting! I converted the SMILES strings of various molecules into ROMol objects, which eventually was inputted into both the python and CPP code, which means both cases have the same treatment prior to going into these functions (test 1). Unexpectedly, when I integrated the QED/SA calculators (in CPP) into a larger program, the ROMol objects that I am inputting do not give me the same results as the python or the standalone CPP version I created (test 2). This makes sense to me because I am inputting SMILES strings in the first test in both programs, while in the second test, I am inputting two different inputs into two different situations. So my question is: How are SMILES strings being treated or editted when it turns into a RWMol/ROMol object? Is there a default treatment to these SMILES when the user turns them into ROMol objects? And what are the recommended treatment functions you would apply to the RWMol before you convert them into ROMols? Thanks, Steven Pak ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Compile redkit_2020_09_1 on macOS Catalina
Hi Zoltan, try adding to your cmake command -DBoost_NO_BOOST_CMAKE=ON. Cheers, p. On Wed, Oct 28, 2020 at 9:32 PM Zoltan Takacs wrote: > Hi, > > I am trying to compile rdkit version 2020-09-1 from source on MacOS > Catalina 10.15.7. Has anyone managed to do this without the boost errors? I > have boost 1.74 installed and I also compiled boost 1.72 from source. if I > want switch to the little bit older boost libraries by passing > -DBOOST_ROOT=/Users/me/boost/ > -DBoost_NO_SYSTEM_PATHS=ON .. Then at the end after the errors I get this > message: > > CMake Warning: > Manually-specified variables were not used by the project: > > BOOST_ROOT > Boost_NO_SYSTEM_PATHS > > And of course it tried to use boost 1.74. The errors related to boost > typically look like this: > > CMake Error at Code/cmake/Modules/RDKitUtils.cmake:55 (add_library): > Target "MolEnumerator_static" links to target "Boost::iostreams" but the > target was not found. Perhaps a find_package() call is missing for an > IMPORTED target, or an ALIAS target is missing? > Call Stack (most recent call first): > Code/GraphMol/MolEnumerator/CMakeLists.txt:1 (rdkit_library) > > I of course tried to use brew to install rdkit but that fails because of > python 3.9 numpy bug. Naturally I would like to use python 3.8.5 which > seems to be picked up by cmake correctly. > > Does anyone have some tips or tricks regarding the problems with boost? > Can i use python 3.8.5 for the python wrappers or only python 3.9 (where > the brew installtion goes wrong I guess)? > > Many thanks, > Zoltan > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Compile redkit_2020_09_1 on macOS Catalina
Hi, I am trying to compile rdkit version 2020-09-1 from source on MacOS Catalina 10.15.7. Has anyone managed to do this without the boost errors? I have boost 1.74 installed and I also compiled boost 1.72 from source. if I want switch to the little bit older boost libraries by passing -DBOOST_ROOT=/Users/me/boost/ -DBoost_NO_SYSTEM_PATHS=ON .. Then at the end after the errors I get this message: CMake Warning: Manually-specified variables were not used by the project: BOOST_ROOT Boost_NO_SYSTEM_PATHS And of course it tried to use boost 1.74. The errors related to boost typically look like this: CMake Error at Code/cmake/Modules/RDKitUtils.cmake:55 (add_library): Target "MolEnumerator_static" links to target "Boost::iostreams" but the target was not found. Perhaps a find_package() call is missing for an IMPORTED target, or an ALIAS target is missing? Call Stack (most recent call first): Code/GraphMol/MolEnumerator/CMakeLists.txt:1 (rdkit_library) I of course tried to use brew to install rdkit but that fails because of python 3.9 numpy bug. Naturally I would like to use python 3.8.5 which seems to be picked up by cmake correctly. Does anyone have some tips or tricks regarding the problems with boost? Can i use python 3.8.5 for the python wrappers or only python 3.9 (where the brew installtion goes wrong I guess)? Many thanks, Zoltan ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [EXTERNAL] Re: Morgan FP atom numbering
I found that on the NY Public Library web site, the book is available, chapter by chapter, as a digital download, if you have a library card. The host site is at John’s-Hopkins, so check your local library system, which might also supply access. -P. On Wed, Oct 28, 2020 at 12:08 PM Cyrus Maher wrote: > Hi Andrew, > > Thank you! This is so thorough, and so helpful. We truly appreciate it. > > All the best, > > -Cyrus > > On 10/27/20, 4:28 AM, "Andrew Dalke" wrote: > > ** EXTERNAL EMAIL ** > > > On Oct 26, 2020, at 17:41, Cyrus Maher wrote: > > I’m wondering if there is an easy way to retrieve the atom numbers > that the morgan fingerprints algorithm assigns as its first step. > > Many of the fingerprint function support an optional "bitInfo" > parameter. If it's a dictionary then the keys are the bit that was set, and > the value is at tuple of the (atom index, radius) which set it. > > Here's an example with theobromine using r=0, which lets you see the > initial invariants: > > >>> from rdkit import Chem > >>> from rdkit.Chem import rdMolDescriptors > >>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C") > >>> bitInfo = {} > >>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0, > useFeatures=1, bitInfo=bitInfo) > >>> for bitno, pairs in sorted(bitInfo.items()): > ... print(f"Bitno: {bitno}") > ... for atom_idx, r in pairs: > ... print(f" atom {atom_idx} > ({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}") > ... > Bitno: 0 > atom 0 (C) with radius 0 > atom 12 (C) with radius 0 > Bitno: 2 > atom 7 (O) with radius 0 > atom 10 (O) with radius 0 > Bitno: 4 > atom 2 (C) with radius 0 > atom 4 (C) with radius 0 > atom 5 (C) with radius 0 > atom 6 (C) with radius 0 > atom 9 (C) with radius 0 > Bitno: 5 > atom 8 (N) with radius 0 > Bitno: 6 > atom 1 (N) with radius 0 > atom 3 (N) with radius 0 > atom 11 (N) with radius 0 > > If I follow the code correctly, when useFeatures == 1 then the intial > invariants are set by getFeatureInvariants() in > ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s= > > A few lines up, at > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI= > , you'll see the features patterns defined in smartsPatterns > > They appear to be identical to the list you gave. > > I reimplemented the initialization function (copied at the end of this > email). Running the program shows that it produces the same invariants > which are used as the bit numbers in the Morgan feature fingerprint: > > Invariant: 0 > atom 0 (C) > atom 12 (C) > Invariant: 2 > atom 7 (O) > atom 10 (O) > Invariant: 4 > atom 2 (C) > atom 4 (C) > atom 5 (C) > atom 6 (C) > atom 9 (C) > Invariant: 5 > atom 8 (N) > Invariant: 6 > atom 1 (N) > atom 3 (N) > atom 11 (N) > > > I believe that gives you two ways to get the information you want! > > Best regards, > > Andrew > da...@dalkescientific.com > > > > > # Python re-implementation of RDKit's getFeatureInvariants() from > # ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp > > from rdkit import Chem > > smartsPatterns = [ > "[$([N;!H0;v3,v4&+1]),\ > $([O,S;H1;+0]),\ > n&+0]", # Donor > "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\ > $([O,S;H0;v2]),\ > $([O,S;-]),\ > $([N;v3;!$(N-*=[O,N,P,S])]),\ > n&+0,\ > $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]",# Acceptor > "[a]", # Aromatic > "[F,Cl,Br,I]", # Halogen > "[#7;+,\ > $([N;H2&+0][$([C,a]);!$([C,a](=O))]),\ > $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\ > $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]", # Basic > "[$([C,S](=[O,S,P])-[O;H1,-1])]"# Acidic > ] > > mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C") > invariants = [0] * mol.GetNumAtoms() > for pattern_idx, smartsPattern in
Re: [Rdkit-discuss] [EXTERNAL] Re: Morgan FP atom numbering
Hi Andrew, Thank you! This is so thorough, and so helpful. We truly appreciate it. All the best, -Cyrus On 10/27/20, 4:28 AM, "Andrew Dalke" wrote: ** EXTERNAL EMAIL ** On Oct 26, 2020, at 17:41, Cyrus Maher wrote: > I’m wondering if there is an easy way to retrieve the atom numbers that the morgan fingerprints algorithm assigns as its first step. Many of the fingerprint function support an optional "bitInfo" parameter. If it's a dictionary then the keys are the bit that was set, and the value is at tuple of the (atom index, radius) which set it. Here's an example with theobromine using r=0, which lets you see the initial invariants: >>> from rdkit import Chem >>> from rdkit.Chem import rdMolDescriptors >>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C") >>> bitInfo = {} >>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0, useFeatures=1, bitInfo=bitInfo) >>> for bitno, pairs in sorted(bitInfo.items()): ... print(f"Bitno: {bitno}") ... for atom_idx, r in pairs: ... print(f" atom {atom_idx} ({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}") ... Bitno: 0 atom 0 (C) with radius 0 atom 12 (C) with radius 0 Bitno: 2 atom 7 (O) with radius 0 atom 10 (O) with radius 0 Bitno: 4 atom 2 (C) with radius 0 atom 4 (C) with radius 0 atom 5 (C) with radius 0 atom 6 (C) with radius 0 atom 9 (C) with radius 0 Bitno: 5 atom 8 (N) with radius 0 Bitno: 6 atom 1 (N) with radius 0 atom 3 (N) with radius 0 atom 11 (N) with radius 0 If I follow the code correctly, when useFeatures == 1 then the intial invariants are set by getFeatureInvariants() in ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s= A few lines up, at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI= , you'll see the features patterns defined in smartsPatterns They appear to be identical to the list you gave. I reimplemented the initialization function (copied at the end of this email). Running the program shows that it produces the same invariants which are used as the bit numbers in the Morgan feature fingerprint: Invariant: 0 atom 0 (C) atom 12 (C) Invariant: 2 atom 7 (O) atom 10 (O) Invariant: 4 atom 2 (C) atom 4 (C) atom 5 (C) atom 6 (C) atom 9 (C) Invariant: 5 atom 8 (N) Invariant: 6 atom 1 (N) atom 3 (N) atom 11 (N) I believe that gives you two ways to get the information you want! Best regards, Andrew da...@dalkescientific.com # Python re-implementation of RDKit's getFeatureInvariants() from # ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp from rdkit import Chem smartsPatterns = [ "[$([N;!H0;v3,v4&+1]),\ $([O,S;H1;+0]),\ n&+0]", # Donor "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\ $([O,S;H0;v2]),\ $([O,S;-]),\ $([N;v3;!$(N-*=[O,N,P,S])]),\ n&+0,\ $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]",# Acceptor "[a]", # Aromatic "[F,Cl,Br,I]", # Halogen "[#7;+,\ $([N;H2&+0][$([C,a]);!$([C,a](=O))]),\ $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\ $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]", # Basic "[$([C,S](=[O,S,P])-[O;H1,-1])]"# Acidic ] mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C") invariants = [0] * mol.GetNumAtoms() for pattern_idx, smartsPattern in enumerate(smartsPatterns): pat = Chem.MolFromSmarts(smartsPattern) for (atom_idx,) in mol.GetSubstructMatches(pat): invariants[atom_idx] |= (1