[Rdkit-discuss] SA_scores and QED scores questions?

2020-10-28 Thread Steven Pak
Hello,

I created two CPP scoring functions based on your QED and SA_score python
code. I tried implementing it as close as possible to get matching scores.
Just to see if the code works. I have succeeded, which is exciting! I
converted the SMILES strings of various molecules into ROMol objects, which
eventually was inputted into both the python and CPP code, which means both
cases have the same treatment prior to going into these functions (test 1).
Unexpectedly, when I integrated the QED/SA calculators (in CPP) into a
larger program, the ROMol objects that I am inputting do not give me the
same results as the python or the standalone CPP version I created (test
2). This makes sense to me because I am inputting SMILES strings in the
first test in both programs, while in the second test, I am inputting two
different inputs into two different situations. So my question is: How are
SMILES strings being treated or editted when it turns into a RWMol/ROMol
object? Is there a default treatment to these SMILES when the user turns
them into ROMol objects? And what are the recommended treatment functions
you would apply to the RWMol before you convert them into ROMols?


Thanks,
Steven Pak
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Compile redkit_2020_09_1 on macOS Catalina

2020-10-28 Thread Paolo Tosco
Hi Zoltan,

try adding to your cmake command -DBoost_NO_BOOST_CMAKE=ON.

Cheers,
p.

On Wed, Oct 28, 2020 at 9:32 PM Zoltan Takacs  wrote:

> Hi,
>
> I am trying to compile rdkit version 2020-09-1 from source on MacOS
> Catalina 10.15.7. Has anyone managed to do this without the boost errors? I
> have boost 1.74 installed and I also compiled boost 1.72 from source. if I
> want switch to the little bit older boost libraries by passing  
> -DBOOST_ROOT=/Users/me/boost/
> -DBoost_NO_SYSTEM_PATHS=ON .. Then at the end after the errors I get this
> message:
>
>  CMake Warning:
>   Manually-specified variables were not used by the project:
>
> BOOST_ROOT
> Boost_NO_SYSTEM_PATHS
>
> And of course it tried to use boost 1.74. The errors related to boost
> typically look like this:
>
> CMake Error at Code/cmake/Modules/RDKitUtils.cmake:55 (add_library):
>   Target "MolEnumerator_static" links to target "Boost::iostreams" but the
>   target was not found.  Perhaps a find_package() call is missing for an
>   IMPORTED target, or an ALIAS target is missing?
> Call Stack (most recent call first):
>   Code/GraphMol/MolEnumerator/CMakeLists.txt:1 (rdkit_library)
>
> I of course tried to use brew to install rdkit but that fails because of
> python 3.9 numpy bug. Naturally I would like to use python 3.8.5 which
> seems to be picked up by cmake correctly.
>
> Does anyone have some tips or tricks regarding the problems with boost?
> Can i use python 3.8.5 for the python wrappers or only python 3.9 (where
> the brew installtion goes wrong I guess)?
>
> Many thanks,
> Zoltan
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Compile redkit_2020_09_1 on macOS Catalina

2020-10-28 Thread Zoltan Takacs
Hi,

I am trying to compile rdkit version 2020-09-1 from source on MacOS Catalina 
10.15.7. Has anyone managed to do this without the boost errors? I have boost 
1.74 installed and I also compiled boost 1.72 from source. if I want switch to 
the little bit older boost libraries by passing  -DBOOST_ROOT=/Users/me/boost/ 
-DBoost_NO_SYSTEM_PATHS=ON .. Then at the end after the errors I get this 
message: 
  
 CMake Warning:
  Manually-specified variables were not used by the project:

BOOST_ROOT
Boost_NO_SYSTEM_PATHS

And of course it tried to use boost 1.74. The errors related to boost typically 
look like this:
 
CMake Error at Code/cmake/Modules/RDKitUtils.cmake:55 (add_library):
  Target "MolEnumerator_static" links to target "Boost::iostreams" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  Code/GraphMol/MolEnumerator/CMakeLists.txt:1 (rdkit_library)

I of course tried to use brew to install rdkit but that fails because of python 
3.9 numpy bug. Naturally I would like to use python 3.8.5 which seems to be 
picked up by cmake correctly. 

Does anyone have some tips or tricks regarding the problems with boost?
Can i use python 3.8.5 for the python wrappers or only python 3.9 (where the 
brew installtion goes wrong I guess)? 

Many thanks,
Zoltan

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [EXTERNAL] Re: Morgan FP atom numbering

2020-10-28 Thread Peter S. Shenkin
I found that on the NY Public Library web site, the book is available,
chapter by chapter, as a digital download, if you have a library card. The
host site is at John’s-Hopkins, so check your local library system, which
might also supply access.

-P.

On Wed, Oct 28, 2020 at 12:08 PM Cyrus Maher  wrote:

> Hi Andrew,
>
> Thank you! This is so thorough, and so helpful. We truly appreciate it.
>
> All the best,
>
> -Cyrus
>
> On 10/27/20, 4:28 AM, "Andrew Dalke"  wrote:
>
> ** EXTERNAL EMAIL **
>
>
> On Oct 26, 2020, at 17:41, Cyrus Maher  wrote:
> > I’m wondering if there is an easy way to retrieve the atom numbers
> that the morgan fingerprints algorithm assigns as its first step.
>
> Many of the fingerprint function support an optional "bitInfo"
> parameter. If it's a dictionary then the keys are the bit that was set, and
> the value is at tuple of the (atom index, radius) which set it.
>
> Here's an example with theobromine using r=0, which lets you see the
> initial invariants:
>
> >>> from rdkit import Chem
> >>> from rdkit.Chem import rdMolDescriptors
> >>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
> >>> bitInfo = {}
> >>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0,
> useFeatures=1, bitInfo=bitInfo)
> >>> for bitno, pairs in sorted(bitInfo.items()):
> ...   print(f"Bitno: {bitno}")
> ...   for atom_idx, r in pairs:
> ... print(f"  atom {atom_idx}
> ({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}")
> ...
> Bitno: 0
>   atom 0 (C) with radius 0
>   atom 12 (C) with radius 0
> Bitno: 2
>   atom 7 (O) with radius 0
>   atom 10 (O) with radius 0
> Bitno: 4
>   atom 2 (C) with radius 0
>   atom 4 (C) with radius 0
>   atom 5 (C) with radius 0
>   atom 6 (C) with radius 0
>   atom 9 (C) with radius 0
> Bitno: 5
>   atom 8 (N) with radius 0
> Bitno: 6
>   atom 1 (N) with radius 0
>   atom 3 (N) with radius 0
>   atom 11 (N) with radius 0
>
> If I follow the code correctly, when useFeatures == 1 then the intial
> invariants are set by getFeatureInvariants() in
> ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s=
>
> A few lines up, at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI=
> , you'll see the features patterns defined in smartsPatterns
>
> They appear to be identical to the list you gave.
>
> I reimplemented the initialization function (copied at the end of this
> email). Running the program shows that it produces the same invariants
> which are used as the bit numbers in the Morgan feature fingerprint:
>
> Invariant: 0
>   atom 0 (C)
>   atom 12 (C)
> Invariant: 2
>   atom 7 (O)
>   atom 10 (O)
> Invariant: 4
>   atom 2 (C)
>   atom 4 (C)
>   atom 5 (C)
>   atom 6 (C)
>   atom 9 (C)
> Invariant: 5
>   atom 8 (N)
> Invariant: 6
>   atom 1 (N)
>   atom 3 (N)
>   atom 11 (N)
>
>
> I believe that gives you two ways to get the information you want!
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> # Python re-implementation of RDKit's getFeatureInvariants() from
> # ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp
>
> from rdkit import Chem
>
> smartsPatterns = [
> "[$([N;!H0;v3,v4&+1]),\
> $([O,S;H1;+0]),\
> n&+0]",  # Donor
> "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\
> $([O,S;H0;v2]),\
> $([O,S;-]),\
> $([N;v3;!$(N-*=[O,N,P,S])]),\
> n&+0,\
> $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]",# Acceptor
> "[a]",  # Aromatic
> "[F,Cl,Br,I]",  # Halogen
> "[#7;+,\
> $([N;H2&+0][$([C,a]);!$([C,a](=O))]),\
> $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\
> $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]",  # Basic
> "[$([C,S](=[O,S,P])-[O;H1,-1])]"# Acidic
> ]
>
> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
> invariants = [0] * mol.GetNumAtoms()
> for pattern_idx, smartsPattern in 

Re: [Rdkit-discuss] [EXTERNAL] Re: Morgan FP atom numbering

2020-10-28 Thread Cyrus Maher
Hi Andrew,

Thank you! This is so thorough, and so helpful. We truly appreciate it.

All the best,

-Cyrus

On 10/27/20, 4:28 AM, "Andrew Dalke"  wrote:

** EXTERNAL EMAIL **


On Oct 26, 2020, at 17:41, Cyrus Maher  wrote:
> I’m wondering if there is an easy way to retrieve the atom numbers that 
the morgan fingerprints algorithm assigns as its first step.

Many of the fingerprint function support an optional "bitInfo" parameter. 
If it's a dictionary then the keys are the bit that was set, and the value is 
at tuple of the (atom index, radius) which set it.

Here's an example with theobromine using r=0, which lets you see the 
initial invariants:

>>> from rdkit import Chem
>>> from rdkit.Chem import rdMolDescriptors
>>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
>>> bitInfo = {}
>>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0, 
useFeatures=1, bitInfo=bitInfo)
>>> for bitno, pairs in sorted(bitInfo.items()):
...   print(f"Bitno: {bitno}")
...   for atom_idx, r in pairs:
... print(f"  atom {atom_idx} 
({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}")
...
Bitno: 0
  atom 0 (C) with radius 0
  atom 12 (C) with radius 0
Bitno: 2
  atom 7 (O) with radius 0
  atom 10 (O) with radius 0
Bitno: 4
  atom 2 (C) with radius 0
  atom 4 (C) with radius 0
  atom 5 (C) with radius 0
  atom 6 (C) with radius 0
  atom 9 (C) with radius 0
Bitno: 5
  atom 8 (N) with radius 0
Bitno: 6
  atom 1 (N) with radius 0
  atom 3 (N) with radius 0
  atom 11 (N) with radius 0

If I follow the code correctly, when useFeatures == 1 then the intial 
invariants are set by getFeatureInvariants() in 
./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at:


https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s=

A few lines up, at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI=
 , you'll see the features patterns defined in smartsPatterns

They appear to be identical to the list you gave.

I reimplemented the initialization function (copied at the end of this 
email). Running the program shows that it produces the same invariants which 
are used as the bit numbers in the Morgan feature fingerprint:

Invariant: 0
  atom 0 (C)
  atom 12 (C)
Invariant: 2
  atom 7 (O)
  atom 10 (O)
Invariant: 4
  atom 2 (C)
  atom 4 (C)
  atom 5 (C)
  atom 6 (C)
  atom 9 (C)
Invariant: 5
  atom 8 (N)
Invariant: 6
  atom 1 (N)
  atom 3 (N)
  atom 11 (N)


I believe that gives you two ways to get the information you want!

Best regards,

Andrew
da...@dalkescientific.com




# Python re-implementation of RDKit's getFeatureInvariants() from
# ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp

from rdkit import Chem

smartsPatterns = [
"[$([N;!H0;v3,v4&+1]),\
$([O,S;H1;+0]),\
n&+0]",  # Donor
"[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\
$([O,S;H0;v2]),\
$([O,S;-]),\
$([N;v3;!$(N-*=[O,N,P,S])]),\
n&+0,\
$([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]",# Acceptor
"[a]",  # Aromatic
"[F,Cl,Br,I]",  # Halogen
"[#7;+,\
$([N;H2&+0][$([C,a]);!$([C,a](=O))]),\
$([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\
$([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]",  # Basic
"[$([C,S](=[O,S,P])-[O;H1,-1])]"# Acidic
]

mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
invariants = [0] * mol.GetNumAtoms()
for pattern_idx, smartsPattern in enumerate(smartsPatterns):
pat = Chem.MolFromSmarts(smartsPattern)
for (atom_idx,) in mol.GetSubstructMatches(pat):
invariants[atom_idx] |= (1