Re: [Rdkit-discuss] Nuget C# Prerelease

2017-08-02 Thread Brian Kelley
alpha1 is out

https://www.nuget.org/packages/RDKit2DotNet/
<https://www.nuget.org/packages/RDKit2DotNet/2017.9.1-alpha>

This adds support for AnyCPU (it ships both x86 and x64 binaries).
However, to run the library you need to initialize the library to find the
DLLs before running any code.  This seems relatively standard, but is a tad
annoying.

 using GraphMolWrap;
 ...
 RDKit.Initialize();

Cheers,
 Brian

On Mon, Jul 31, 2017 at 9:24 PM, Brian Kelley <fustiga...@gmail.com> wrote:

> For the small percentage of you who use C# we finally have a NuGet release!
>
> https://www.nuget.org/packages/RDKit2DotNet/2017.9.1-alpha
>
> Notes:
>
>  1. This is a prerelase, the number of C# tests is vanishingly small
>  2. x64 only for now, you'll need to change AnyCPU targets to specifically
> target x64
>  3, It is built against the NuGet boost vc140 1.62 boost builds if that
> means anything to anyone.
>  4. not linked against cairo, so depiction support is minimal (i.e. SVG)
>
> I think it's a good start though.
>
> Cheers,
>  Brian
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Nuget C# Prerelease

2017-07-31 Thread Brian Kelley
For the small percentage of you who use C# we finally have a NuGet release!

https://www.nuget.org/packages/RDKit2DotNet/2017.9.1-alpha

Notes:

 1. This is a prerelase, the number of C# tests is vanishingly small
 2. x64 only for now, you'll need to change AnyCPU targets to specifically
target x64
 3, It is built against the NuGet boost vc140 1.62 boost builds if that
means anything to anyone.
 4. not linked against cairo, so depiction support is minimal (i.e. SVG)

I think it's a good start though.

Cheers,
 Brian
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Delete several Atoms

2017-06-25 Thread Brian Kelley
Yes, go backwards through the index list.

for index in sorted(indices, reverse=True):
  mol.RemoveAtom(index)

Indices are only changed if they are higher than the removed index.

Brian Kelley

> On Jun 25, 2017, at 10:16 AM, Changge Ji <chicago...@gmail.com> wrote:
> 
> Dear all,
> 
> Is there an easy way to delete several atoms in a molecule according to an 
> index list ?
> 
> RWMol RemoveAtom() can only delete one atom each time.
> And after that, the index changed. 
> 
> Many Thanks.
> 
> Best,
> Changge
> Changge Ji
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Canonical order in SMILES

2017-06-17 Thread Brian Kelley
After canonicalization, do the following

d = mol.GetPropsAsDict(True,True)

In the dictionary there will be a key something like _smilesAtomOutputOrder 
which contains a vector of atom indices in output order.


Brian Kelley

> On Jun 17, 2017, at 1:42 PM, Jean-Marc Nuzillard <jm.nuzill...@univ-reims.fr> 
> wrote:
> 
> Dear all,
> 
> sorry for asking for something that has certainly been already answered.
> 
> Chem.MolToSmiles(m) produced a SMILES string for the the given molecule m.
> How is it possible to associate the order of atom apparition in the SMILES 
> chain
> to a list of atom indexes in m?
> 
> All the best,
> 
> Jean-Marc
> 
> -- 
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
> 
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
> 
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/ICMR
> http://eos.univ-reims.fr/LSD/ISgroup.html
> 
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Brian Kelley
Sorry to hear about the flooding.

As an aside if you want to get they smiles atom output order, it is saved as a 
property on the molecule after a call to MolToSmiles,

To get to the property, use mol.GetPropsAsDict(True,True) and it will be there 
with the key named something like "_smilesAtomOutputOrder"

We should probably make a helper function for this.

----
Brian Kelley

> On Jun 15, 2017, at 6:27 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote:
> 
>> On 06/15/2017 10:13 AM, Maciek Wójcikowski wrote:
>> Hi,
>> 
>> If you really want to rely on the order of atom you can renumber them
>> anyhow you like with Chem.RenumberAtoms()
>> http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms
>> There is also a function which returns canonical order of atoms for
>> you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ
>> from the canonical smiles, although that might have changed.
> 
> https://www.nature.com/articles/sdata201773
> 
> Unfortunately we got flooded day before yesterday and the servers doing
> the crunching are currently down.
> 
> -- 
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AllChem.GetConformerRMSD: this is not RMSD between two conformers but an upper bound of it

2017-06-15 Thread Brian Kelley
Thanks for the documentation fix, I had read the same as Francois.




Brian Kelley

> On Jun 15, 2017, at 8:02 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> 
> 
>> On Thu, Jun 15, 2017 at 6:30 AM, Francois BERENGER 
>> <beren...@bioreg.kyushu-u.ac.jp> wrote:
>> 
>> I am afraid that in AllChem.GetConfomerRMSD: one doesn't get the RMSD
>> between the two conformers but an upper bound of it.
> 
> The documentation to this function is misleading:
> 
> In [21]: AllChem.GetConformerRMS?
> Signature: AllChem.GetConformerRMS(mol, confId1, confId2, atomIds=None, 
> prealigned=False)
> Docstring:
> Returns the RMS between two conformations.
> By default, the conformers will be aligned to the first conformer
> of the molecule (i.e. the reference) before RMS calculation and,
> as a side-effect, will be left in the aligned state.
> 
> Arguments:
>   - mol:the molecule
>   - confId1:the id of the first conformer
>   - confId2:the id of the second conformer
>   - atomIds:(optional) list of atom ids to use a points for
> alingment - defaults to all atoms
>   - prealigned: (optional) by default the conformers are assumed
> be unaligned and will therefore be aligned to the
> first conformer
> 
> 
> The alignment is done to the first conformer (i.e confId1).[1]
> Here's a demonstration of that:
> In [31]: AllChem.GetConformerRMS(m,0,1,prealigned=True)
> Out[31]: 9.1593890932638349
> 
> In [32]: AllChem.GetConformerRMS(m,0,2,prealigned=True)
> Out[32]: 3.8219771356556071
> 
> In [33]: AllChem.GetConformerRMS(m,1,2,prealigned=True)
> Out[33]: 8.597878324406647
> 
> In [34]: AllChem.GetConformerRMS(m,1,2)
> Out[34]: 1.1067869816465845   # conformer 2 is now aligned to conformer 1
> 
> In [35]: AllChem.GetConformerRMS(m,0,1,prealigned=True)
> Out[35]: 9.1593890932638349   # the RMS between confs 0 and 1 hasn't changed
> 
> In [36]: AllChem.GetConformerRMS(m,0,2,prealigned=True)
> Out[36]: 9.4691776880629508   # the RMS between confs 0 and 2 has changed
> 
> I will clean that documentation up.
> 
> 
> -greg
> [1] since that's a "conformer of the molecule" the documentation isn't 
> actually wrong, but it's misleading enough to be effectively wrong.
>  
>> 
>> I understand from the doc that if they are aligned, they are aligned
>> to the first conformer of the molecule.
>> 
>> To get the real RMSD between two conformers, they must
>> be superimposed together, not to a third conformer.
>> 
>> Please tell me if I'm wrong.
>> 
>> Regards,
>> F.
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AllChem.GetConformerRMSD: this is not RMSD between two conformers but an upper bound of it

2017-06-15 Thread Brian Kelley
The function you want is GetBestRMS, note that you can set the conformer idx 
for the probe and ref.

http://www.rdkit.org/Python_Docs/rdkit.Chem.AllChem-module.html#GetBestRMS


Brian Kelley

> On Jun 15, 2017, at 5:30 AM, Francois BERENGER 
> <beren...@bioreg.kyushu-u.ac.jp> wrote:
> 
> Hello,
> 
> I am afraid that in AllChem.GetConfomerRMSD: one doesn't get the RMSD
> between the two conformers but an upper bound of it.
> 
> I understand from the doc that if they are aligned, they are aligned
> to the first conformer of the molecule.
> 
> To get the real RMSD between two conformers, they must
> be superimposed together, not to a third conformer.
> 
> Please tell me if I'm wrong.
> 
> Regards,
> F.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Brian Kelley
Yes, atoms are always added in file order.  It would take a major change in 
rdkit to change/violate this.


Brian Kelley

> On Jun 15, 2017, at 7:52 AM, Francois BERENGER 
> <beren...@bioreg.kyushu-u.ac.jp> wrote:
> 
> Hello,
> 
> If I read a molecule from a .sdf file, will the atom indexes be 
> conserved/preserved?
> 
> 1st atom in the file will have index 0,
> 2nd index 1, etc.
> 
> And, will this always hold in the future?
> Is this an invariant of rdkit?
> 
> Thanks,
> F.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Conformational search not "converging" to low energy conformation

2017-06-12 Thread Brian Kelley
There shouldn't be any expectation that the MMFF energy should converge to
1kcal/mol.

There *may* be a case to be made for the MMFF_energy - Global_minimum be <
1kcal/mol, however, in general, we don't know what the global minimum is.
I suggest looking at this paper:

https://www.ncbi.nlm.nih.gov/pubmed/15115393

This paper describes a basic technique for analyzing conformational
strain.  It is a bit out of date, however their rule of thumb for MMFF is
that most molecules in their bioactive form will be < 5kcal/mol but many
may be > 9kcal when compared against the sampled global minimum.

Two issues: (1) RDKIT doesn't make a regular sampling of conformations, it
relies on random embeddings.  I have no idea how this may effect finding
the global minim.  (2) Another issue is that the RDKIT molecule may have
slight deviations from MMFF angles and bond lengths that may artificially
make the results look worse.  We could solve this by optimizing in a
harmonic well of, say .5 angstroms to shrug off these high energies prior
to computing the actual MMFF energy.

I hope this helps,

-Brian

On Mon, Jun 12, 2017 at 2:23 PM, Jan Halborg Jensen 
wrote:

> The code below shows the lowest energy found for 6 different protomers
> defined by the smiles strings below as a function of number of conformers.
> Even with 2000 conformers I am not getting convergence to within 1 kcal/mol
> for comp109_1=2.
>
> Is this expected? Any advice or tips appreciated
>
> 200 800 1000 2500
> comp109_0=0 0.16 0.16 0.16 0.16
> comp109_1=1 -25.18 -24.43 -25.08 -25.18
> comp109_1=2 -16.42 -16.24 -16.21 -15.15
> comp109_0=3 -24.05 -24.16 -24.09 -23.96
> comp109_1=5 -38.4 -37.38 -38.28 -38.32
> comp109_2=8 0.18 1.38 -0.24 0.08
>
> Code
>
> import sys
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
> confs = 2500
> e_cut = 20.0
> decimals_in_energies = 2
>
> filename = sys.argv[1]
> file = open(filename, "r")
>
> for line in file:
> words = line.split()
> name = words[0]
> smiles = words[1]
>
> m = Chem.AddHs(Chem.MolFromSmiles(smiles))
>
> AllChem.EmbedMultipleConfs(m,numConfs=confs)
> AllChem.MMFFOptimizeMoleculeConfs(m,numThreads=8,maxIters=1000,
> mmffVariant="MMFF94")
>
> energies = []
> for conf in m.GetConformers():
> tm = Chem.Mol(m,False,conf.GetId())
> prop = AllChem.MMFFGetMoleculeProperties(tm, mmffVariant="MMFF94")
> ff =AllChem.MMFFGetMoleculeForceField(tm,prop)
> energies.append(round(ff.CalcEnergy(),decimals_in_energies))
>
> e_min = min(energies)
> print e_min
>
>
> Smiles file
> comp109_0=0 C[C@@]1(c2cc(NC(=O)c3ccc(C#N)cn3)ccc2F)C[C@H](C(F)(F)F)OC(
> N)=N1
> comp109_1=1 C[C@@]1(c2cc(NC(=O)c3ccc(C#N)cn3)ccc2F)C[C@H](C(F)(F)F)OC(=
> [NH2+])N1
> comp109_1=2 C[C@@]1(c2cc(NC(=O)c3ccc(C#N)c[nH+]3)ccc2F)C[C@H](C(F)(F)F)
> OC(N)=N1
> comp109_0=3 C[C@@]1(c2cc(NC(=O)c3ccc(C#N)cn3)ccc2F)C[C@H](C(F)(F)F)OC(=
> N)N1
> comp109_1=5 C[C@@]1(c2cc(NC(=O)c3ccc(C#N)c[nH+]3)ccc2F)C[C@H](C(F)(F)F)
> OC(=N)N1
> comp109_2=8 C[C@@]1(c2cc(NC(=O)c3ccc(C#N)c[nH+]3)ccc2F)C[C@H](C(F)(F)F)
> OC(=[NH2+])N1
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-09 Thread Brian Kelley
 What exactly are you doing?

Is this 1000x500k substructure queries or something different?


Brian Kelley

> On Jun 9, 2017, at 9:12 AM, Alexis Parenty <alexis.parenty.h...@gmail.com> 
> wrote:
> 
> Dear Greg and Brian, 
> Many thanks for your response. I was also thinking of your streaming 
> approach! I think the RAM of most machine would deal with lists of 100K mol 
> so we could put the threshold higher than 1000. Actually, I was thinking to 
> monitor the available RAM and only start processing the matrix and clearing 
> the list when less than 20% of RAM is left. This way, the best machines could 
> skip the clearing process and gain time. What do you think?
> 
> 
> Best,
> 
> Alexis
> 
> 
> 
> 
> 
>> On 9 June 2017 at 14:40, Brian Kelley <fustiga...@gmail.com> wrote:
>> While not multithreaded (yet) this is the use case of the filter catalog:
>> 
>> http://rdkit.blogspot.com/2016/04/changes-in-201603-release-filtercatalog.html?m=1
>> 
>> Look for the SmartsMatcher class in the blog.
>> 
>> It is a good idea to make this multithreaded as well, I'll add this as a 
>> possible enhancement.
>> 
>> 
>> Brian Kelley
>> 
>>> On Jun 9, 2017, at 7:04 AM, Greg Landrum <greg.land...@gmail.com> wrote:
>>> 
>>> Hi Alexis,
>>> 
>>> I would approach this by loading the 1000 queries into a list of molecules 
>>> and then "stream" the others past that (so that you never attempt to load 
>>> the full 500K set at once).
>>> 
>>> Here's a quick sketch of one way to do this:
>>> In [4]: queries = [x for x in Chem.ForwardSDMolSupplier('mols.1000.sdf') if 
>>> x is not None]
>>> 
>>> In [5]: matches = []
>>> 
>>> In [6]: for m in Chem.ForwardSDMolSupplier('./znp.50k.sdf'):
>>>...: if m is None:
>>>...: continue
>>>...: matches.append([m.HasSubstructMatch(q) for q in queries])
>>>...: 
>>> 
>>> 
>>> Brian has some thoughts on making this particular use case easier/faster 
>>> (in particular by adding multi-threading support), so maybe there will be 
>>> something in the next release there.
>>> 
>>> I hope this helps,
>>> -greg
>>> 
>>> 
>>>> On Sun, Jun 4, 2017 at 10:25 PM, Alexis Parenty 
>>>> <alexis.parenty.h...@gmail.com> wrote:
>>>> Dear RDKit community,
>>>> 
>>>> I need to screen for substructure relationships between two sets of 
>>>> structures (1 000 X 500 000): I thought I should build two lists of mol 
>>>> objects from SMILES, but I keep having a memory error when the second list 
>>>> reaches 300 000 mol. All my RAM (12G) gets consumed along with all my 
>>>> virtual memory.
>>>> 
>>>> Do I really have to compromise on speed and make mol object on the flight 
>>>> from two lists of SMILES? Is there another memory efficient way to store 
>>>> mol object?
>>>> 
>>>> Best,
>>>> 
>>>> Alexis
>>>> 
>>>> 
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>> 
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-09 Thread Brian Kelley
While not multithreaded (yet) this is the use case of the filter catalog:

http://rdkit.blogspot.com/2016/04/changes-in-201603-release-filtercatalog.html?m=1

Look for the SmartsMatcher class in the blog.

It is a good idea to make this multithreaded as well, I'll add this as a 
possible enhancement.


Brian Kelley

> On Jun 9, 2017, at 7:04 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> Hi Alexis,
> 
> I would approach this by loading the 1000 queries into a list of molecules 
> and then "stream" the others past that (so that you never attempt to load the 
> full 500K set at once).
> 
> Here's a quick sketch of one way to do this:
> In [4]: queries = [x for x in Chem.ForwardSDMolSupplier('mols.1000.sdf') if x 
> is not None]
> 
> In [5]: matches = []
> 
> In [6]: for m in Chem.ForwardSDMolSupplier('./znp.50k.sdf'):
>...: if m is None:
>...: continue
>...: matches.append([m.HasSubstructMatch(q) for q in queries])
>...: 
> 
> 
> Brian has some thoughts on making this particular use case easier/faster (in 
> particular by adding multi-threading support), so maybe there will be 
> something in the next release there.
> 
> I hope this helps,
> -greg
> 
> 
>> On Sun, Jun 4, 2017 at 10:25 PM, Alexis Parenty 
>> <alexis.parenty.h...@gmail.com> wrote:
>> Dear RDKit community,
>> 
>> I need to screen for substructure relationships between two sets of 
>> structures (1 000 X 500 000): I thought I should build two lists of mol 
>> objects from SMILES, but I keep having a memory error when the second list 
>> reaches 300 000 mol. All my RAM (12G) gets consumed along with all my 
>> virtual memory.
>> 
>> Do I really have to compromise on speed and make mol object on the flight 
>> from two lists of SMILES? Is there another memory efficient way to store mol 
>> object?
>> 
>> Best,
>> 
>> Alexis
>> 
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Does rdkit depend on pandas?

2017-06-06 Thread Brian Kelley
No.  The main reason that the conda recipe includes pandas is for testing
the pandas extension.  We could probably remove it from the run-time
dependency however and let the user install it in addition.

In any case, feel free to remove pandas from the conda installation.

Cheers,
 Brian

On Tue, Jun 6, 2017 at 9:54 AM, Michał Nowotka  wrote:

> Hi,
>
> I just upgraded rdkit from 2017.03.1 to 2017.03.2 using Conda. What I
> have noticed is that pandas are now installed during the installation
> of rdkit.
> Does rdkit depend on pandas now? Is it safe to remove it? If it works
> without pandas, maybe it makes sense to remove the dependency.
>
> Kind regards,
>
> Michał Nowotka
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit on armv7h

2017-05-31 Thread Brian Kelley
Try

cmake -DRDK_OPTIMIZE_NATIVE=off 

This should turn off popcnt which doesn't exist on arm7




Brian Kelley

> On May 31, 2017, at 5:08 PM, Samo Turk <samo.t...@gmail.com> wrote:
> 
> Dear RDKit community,
> 
> I have trouble compiling RDKit on Arch Linux on armv7h cpu. It has to be 
> something with CPU architecture since the same build script is working on 
> Arch Linux on x86_64. Package versions are the same on both computers: gcc 
> 7.1.1, boost 1.64.0, python 3.6.1 and build script is here: 
> https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=rdkit
> 
> Error I get:
> c++: error: unrecognized command line option ‘-mpopcnt’
> make[2]: *** [Code/RDGeneral/CMakeFiles/RDGeneral.dir/build.make:63: 
> Code/RDGeneral/CMakeFiles/RDGeneral.dir/Invariant.cpp.o] Error 1
> make[1]: *** [CMakeFiles/Makefile2:528: 
> Code/RDGeneral/CMakeFiles/RDGeneral.dir/all] Error 2
> make: *** [Makefile:163: all] Error 2
> 
> Cheers,
> Samo
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical smiles for fragments with map numbers

2017-05-27 Thread Brian Kelley
Pavel, this isn't exactly trivial so I went ahead and made an example.  The
basics are that atomMaps are canonicalized, i.e. their value is used in the
generation of smiles.

To solve this problem:
1) backup the atom maps and remove them
2) canonicalize *without* atom maps but figure out the order in which the
atoms in the molecule are output
3) using the atom output order, relabel the atom maps based on output order.

That's a mouthful, but here's some code that should do the trick:

from rdkit import Chem

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
   "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
   "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
   "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
   "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
   "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]


def CanonicalizeMaps(m, *a, **kw):
# atom maps are canonicalized, so rename them
#  figure out where they would have gone
#  and relabel from 1...N based on output order
atomMap = "molAtomMapNumber"
backupAtomMap = "oldMolAtomMapNumber"

for atom in m.GetAtoms():
if atom.HasProp(atomMap):
atomNum = atom.GetProp(atomMap)
atom.SetProp(backupAtomMap, atomNum)
atom.ClearProp(atomMap)

# canonicalize
smi = Chem.MolToSmiles(m, *a, **kw)
# where did the atoms end up in the output string?
atoms = [(pos, atom_idx) for atom_idx, pos in enumerate(
eval(m.GetProp("_smilesAtomOutputOrder")))]
atommap = 1
atoms.sort()

# set the new atommap based on output position
for pos, atom_idx in atoms:
atom = m.GetAtomWithIdx(atom_idx)
if atom.HasProp(backupAtomMap):
atom.SetProp(atomMap, str(atommap))
atommap +=1

return Chem.MolToSmiles(m)

for s in smi:
m = Chem.MolFromSmiles(s)
print CanonicalizeMaps(m,True)



Output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]

Now, if you want the atomMaps in 1...2...3 output order, we could do that
as well, but it is even trickier.

Enjoy,
 Brian

On Sat, May 27, 2017 at 8:36 AM, Pavel Polishchuk 
wrote:

> Hi,
>
>   I cannot solve an issue and would like to ask for an advice.
>   If there are different map numbers for attachment points for the same
> fragment different canonical smiles are generated.
>   I observed such behavior only for fragments with 3 attachment points.
> Below is an example.
>   I'm looking for a solution/workaround how to produce the "same" smiles
> strings irrespectively of mapping that after removal of map numbers smiles
> will become identical.
>   Any advice would be appreciated.
>
> smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
>"ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
>"ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
>"ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
>"ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
>"ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]
>
> for s in smi:
> print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))
>
> output:
> S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
> S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
> S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
> S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
> S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
> S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]
>
> Kind regards,
> Pavel.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Brian Kelley
Dear All,
  In case it helps, there is a wealth of functional groups already in RDKit
available here:

https://github.com/rdkit/rdkit/blob/master/Data/Functional_Group_Hierarchy.txt

For instance, the functional group halogen pattern we use is a bit more
complicated:

[$([F,Cl,Br,I]-!@[#6]);!$([F,Cl,Br,I]-!@C-!@[F,Cl,Br,I]);!$([F,Cl,Br,I]-[C,S](=[O,S,N]))]

That can (1) help you write your own patterns and (2) be used (from python)
as follows:


from __future__ import print_function
from rdkit import Chem
from rdkit.Chem import FilterCatalog

queryDefs = FilterCatalog.GetFlattenedFunctionalGroupHierarchy()
smiles = "ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
mol = Chem.MolFromSmiles(smiles)
items = sorted(queryDefs.items())
for name, pat in items:
   print("%s\t%s"%(name, mol.HasSubstructMatch(pat)))


AcidChloride False

AcidChloride.Aliphatic False

AcidChloride.Aromatic False

Alcohol False

Alcohol.Aliphatic False

Alcohol.Aromatic False

Aldehyde False

Aldehyde.Aliphatic False

Aldehyde.Aromatic False

Amine True

Amine.Aliphatic True

Amine.Aromatic False

Amine.Cyclic True

Amine.Primary False

Amine.Primary.Aliphatic False

Amine.Primary.Aromatic False

Amine.Secondary True

Amine.Secondary.Aliphatic True

Amine.Secondary.Aromatic False

Amine.Tertiary False

Amine.Tertiary.Aliphatic False

Amine.Tertiary.Aromatic False

Azide False

Azide.Aliphatic False

Azide.Aromatic False

BoronicAcid False

BoronicAcid.Aliphatic False

BoronicAcid.Aromatic False

CarboxylicAcid False

CarboxylicAcid.Aliphatic False

CarboxylicAcid.AlphaAmino False

CarboxylicAcid.Aromatic False

Halogen True

Halogen.Aliphatic False

Halogen.Aromatic True

Halogen.Bromine False

Halogen.Bromine.Aliphatic False

Halogen.Bromine.Aromatic False

Halogen.Bromine.BromoKetone False

Halogen.NotFluorine True

Halogen.NotFluorine.Aliphatic False

Halogen.NotFluorine.Aromatic True

Isocyanate False

Isocyanate.Aliphatic False

Isocyanate.Aromatic False

Nitro False

Nitro.Aliphatic False

Nitro.Aromatic False

SulfonylChloride False

SulfonylChloride.Aliphatic False

SulfonylChloride.Aromatic False

TerminalAlkyne False


Cheers,
 Brian

On Wed, May 17, 2017 at 9:20 AM, Alexis Parenty <
alexis.parenty.h...@gmail.com> wrote:

> Hi Michal, thanks for your response.
> I think I made a typo somewhere in my previous code since it now works
> fine, even without the the kekule notation... Sorry about the confusion...
> Best,
>
> Alexis
>
> On 17 May 2017 at 13:59, Michal Krompiec 
> wrote:
>
>> Hi Alexis,
>> Try aromatic form instead of Kekule notation.
>> Best,
>> Michal
>>
>> On 17 May 2017 at 12:55, Alexis Parenty 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I am looking for substructure match between a smarts and a smiles, but I
>>> want any heteroatom from the smarts to match any heteroatom from a smiles:
>>>
>>>
>>> [image: Inline images 1]
>>>
>>>
>>>
>>>
>>>
>>> The following does not return what I would expect:
>>>
>>> smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " 
>>> ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
>>>
>>> mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2)
>>> *print*("mol1 is a substructure of mol2: 
>>> {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of 
>>> mol1: {}".format(mol1.HasSubstructMatch(mol2)))
>>>
>>>
>>>
>>> ð  mol1 is a substructure of mol2: False
>>>
>>> ð  mol2 is a substructure of mol1: False
>>>
>>> How could I do that?
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Alexis
>>>
>>>

>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check If Atom Is in Two Small Rings

2017-04-11 Thread Brian Kelley
Peter, quite correct.

To do that, you'll need to do operations on the rings themselves:

>>> m = Chem.MolFromSmiles("C1CC12CCC2")

>>> list(m.GetRingInfo().AtomRings())

[(0, 1, 2), (3, 4, 5, 2)]

And set operations are probably your friend

>>> m = Chem.MolFromSmiles("C1CCC12CCC2")

>>> list(m.GetRingInfo().AtomRings())

[(0, 3, 2, 1), (4, 5, 6, 3)]

# get all 4 membered rings

>>> rings = [set(r) for r in m.GetRingInfo().AtomRings() if len(r) == 4]

# from the first ring, see if there are any intersections

>>> s = rings[0]

>>> for r in rings[1:]:

...   p = s.intersection(r)

...   if p: print p

...

set([3])

Cheers,
 Brian

On Tue, Apr 11, 2017 at 4:19 PM, Peter S. Shenkin <shen...@gmail.com> wrote:

> But Brian's solution won't help Jonathan find atoms that are in two
> three-membered or two four-membered rings, which I thought Jonathan also
> wanted, based on the wording of the original query.
>
> -P.
>
> On Tue, Apr 11, 2017 at 4:12 PM, Curt Fischer <curt.r.fisc...@gmail.com>
> wrote:
>
>> Brian's solution is obviously better (shorter, uses less functions) than
>> mine.  (Although mine assumes that you want atoms that are part of
>> _exactly_ two rings, not atoms that are part of _at least_ two rings as
>> Brian's does.  Probably Brian's solution is what you want but worth noting.)
>>
>> CF
>>
>> On Tue, Apr 11, 2017 at 1:03 PM, Brian Kelley <fustiga...@gmail.com>
>> wrote:
>>
>>> You are so close!
>>>
>>> >>> from rdkit import Chem
>>>
>>> >>> m = Chem.MolFromSmiles("C1CC12CCC2")
>>>
>>> >>> for atom in m.GetAtoms():
>>>
>>> ...   if atom.IsInRingSize(3) and atom.IsInRingSize(4): print
>>> atom.GetIdx()
>>>
>>> ...
>>>
>>> 2
>>>
>>> >>>
>>>
>>> Cheers,
>>>  Brian
>>>
>>> On Tue, Apr 11, 2017 at 1:38 PM, Jonathan Saboury <jsab...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I'm trying to make a function to check if a mol has an atom that is
>>>> part of two small rings (3 or 4 atoms). Using GetRingInfo()/NumAtomRings()
>>>> I can find out how many ring systems each atom is in, but not the details
>>>> of the rings. atom.IsInRingSize(size) returns a bool so I couldn't use
>>>> that. I'm using the python api.
>>>>
>>>> Any suggestions? Thanks!
>>>>
>>>> - Jonathan
>>>>
>>>> 
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check If Atom Is in Two Small Rings

2017-04-11 Thread Brian Kelley
You are so close!

>>> from rdkit import Chem

>>> m = Chem.MolFromSmiles("C1CC12CCC2")

>>> for atom in m.GetAtoms():

...   if atom.IsInRingSize(3) and atom.IsInRingSize(4): print atom.GetIdx()

...

2

>>>

Cheers,
 Brian

On Tue, Apr 11, 2017 at 1:38 PM, Jonathan Saboury  wrote:

> Hello All,
>
> I'm trying to make a function to check if a mol has an atom that is part
> of two small rings (3 or 4 atoms). Using GetRingInfo()/NumAtomRings() I can
> find out how many ring systems each atom is in, but not the details of the
> rings. atom.IsInRingSize(size) returns a bool so I couldn't use that. I'm
> using the python api.
>
> Any suggestions? Thanks!
>
> - Jonathan
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Brian Kelley
Correction here, you are not making two products because you are grouping
the results ala:

>>> rxn = AllChem.ReactionFromSmarts("([C:1][*][N:2])>>([C:1].[N:2])")

>>> prods = rxn.RunReactants([Chem.MolFromSmiles("FC1ON1I")])

>>> Chem.MolToSmiles(prods[0][0])

'CF.NI'

However, it appears that you aren't mapping anything explicitly between
[C:1] and [N:2] in some cases so the left hand side doesn't know what
really to do.

I'll have to dig into this a little more.

Cheers,
 Brian


On Thu, Mar 30, 2017 at 12:56 PM, Brian Kelley <fustiga...@gmail.com> wrote:

> I have a feeling you may need to make two reactions.  Let's consider a
> dirt simple case:
>
> >>> rxn = AllChem.ReactionFromSmarts("[C:1][N:2]>>[C:1].[N:2]")
>
> >>> prods = rxn.RunReactants([Chem.MolFromSmiles("CN")])
>
> >>> Chem.MolToSmiles(prods[0][0])
>
> 'C'
>
> >>> Chem.MolToSmiles(prods[0][1])
>
> 'N'
>
> >>>
>
> Note that this reaction is explicitly breaking a bond.  I think this is
> what you are seeing with your example.
>
> Note that similar to the "." on the reagent side meaning multiple
> reagents, the "." on the right hand side means there will be multiple
> products.
>
> Does this help at all?
>
> Cheers,
>  Brian
>
> On Thu, Mar 30, 2017 at 12:07 PM, Stephen Roughley <
> s.rough...@vernalis.com> wrote:
>
>> Dear Greg/RDKitters,
>>
>>
>>
>> This may be user error, or misunderstanding of rSMARTS, so can anyone
>> throw some light on the following behaviour?
>>
>>
>>
>> First example works as expected – there are 2× Ph in m4, so we end up
>> with 2×2×2 copies of the expected product:
>>
>>
>>
>> rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H
>> 0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)>
>> >([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)' #Replace 2×
>> Ph-* with 2× 3-Fl-C6H4-*
>>
>> rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
>>
>> rxn4
>>
>> m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
>>
>> m4
>>
>> prodsbi=rxn4.RunReactants((m4,))
>>
>> for prod in prodsbi:
>>
>> Chem.SanitizeMol(prod[0])
>>
>> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4,
>> subImgSize=(200,200))
>>
>>
>>
>> Now consider the following – the only difference I can think of is that
>> the [*:1] and [*:2] atoms map to adjacent, directly bonded atoms – I cant
>> see why that should matter…
>>
>>
>>
>> m3=Chem.MolFromSmiles('c1c1COc1c1')
>>
>> m3
>>
>> prodsbi=rxn4.RunReactants((m3,))
>>
>> for prod in prodsbi:
>>
>> Chem.SanitizeMol(prod[0])
>>
>> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8,
>> subImgSize=(200,200))
>>
>>
>>
>> Just to be sure this is as I think it looks..
>>
>> prodsbi[0][0]
>>
>>
>>
>> Any suggestions as to why this happens, and whether it is the expected
>> behaviour? (And how to avoid it?!)
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>>
>>
>>
>>
>> __
>> PLEASE READ: This email is confidential and may be privileged. It is
>> intended for the named addressee(s) only and access to it by anyone else is
>> unauthorised. If you are not an addressee, any disclosure or copying of the
>> contents of this email or any action taken (or not taken) in reliance on it
>> is unauthorised and may be unlawful. If you have received this email in
>> error, please notify the sender or postmas...@vernalis.com. Email is not
>> a secure method of communication and the Company cannot accept
>> responsibility for the accuracy or completeness of this message or any
>> attachment(s). Please check this email for virus infection for which the
>> Company accepts no responsibility. If verification of this email is sought
>> then please request a hard copy. Unless otherwise stated, any views or
>> opinions presented are solely those of the author and do not represent
>> those of the Company.
>>
>> The Vernalis Group of Companies
>> 100 Berkshire Place
>> Wharfedale Road
>> Winnersh, Berkshire
>> RG41 5RD, England
>> Tel: +44 (0)118 938  <+44%20118%20938%20>
>>
>> To access trading company registration and address det

Re: [Rdkit-discuss] RDKit Reaction gives disconnected components

2017-03-30 Thread Brian Kelley
I have a feeling you may need to make two reactions.  Let's consider a dirt
simple case:

>>> rxn = AllChem.ReactionFromSmarts("[C:1][N:2]>>[C:1].[N:2]")

>>> prods = rxn.RunReactants([Chem.MolFromSmiles("CN")])

>>> Chem.MolToSmiles(prods[0][0])

'C'

>>> Chem.MolToSmiles(prods[0][1])

'N'

>>>

Note that this reaction is explicitly breaking a bond.  I think this is
what you are seeing with your example.

Note that similar to the "." on the reagent side meaning multiple reagents,
the "." on the right hand side means there will be multiple products.

Does this help at all?

Cheers,
 Brian

On Thu, Mar 30, 2017 at 12:07 PM, Stephen Roughley 
wrote:

> Dear Greg/RDKitters,
>
>
>
> This may be user error, or misunderstanding of rSMARTS, so can anyone
> throw some light on the following behaviour?
>
>
>
> First example works as expected – there are 2× Ph in m4, so we end up with
> 2×2×2 copies of the expected product:
>
>
>
> rSMARTS4='([*:1]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!
> H0]:1.[*:2]-&!@c1:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:[c&!H0]:1)
> >>([*:1]-!@c:1:c:c(-F):c:c:c1.[*:2]-!@c:1:c:c(-F):c:c:c1)' #Replace 2×
> Ph-* with 2× 3-Fl-C6H4-*
>
> rxn4=AllChem.ReactionFromSmarts(rSMARTS4)
>
> rxn4
>
> m4=Chem.MolFromSmiles('c1c1CCOCc1c1')
>
> m4
>
> prodsbi=rxn4.RunReactants((m4,))
>
> for prod in prodsbi:
>
> Chem.SanitizeMol(prod[0])
>
> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=4,
> subImgSize=(200,200))
>
>
>
> Now consider the following – the only difference I can think of is that
> the [*:1] and [*:2] atoms map to adjacent, directly bonded atoms – I cant
> see why that should matter…
>
>
>
> m3=Chem.MolFromSmiles('c1c1COc1c1')
>
> m3
>
> prodsbi=rxn4.RunReactants((m3,))
>
> for prod in prodsbi:
>
> Chem.SanitizeMol(prod[0])
>
> Draw.MolsToGridImage([prod[0] for prod in prodsbi],molsPerRow=8,
> subImgSize=(200,200))
>
>
>
> Just to be sure this is as I think it looks..
>
> prodsbi[0][0]
>
>
>
> Any suggestions as to why this happens, and whether it is the expected
> behaviour? (And how to avoid it?!)
>
> Thanks,
>
> Steve
>
>
>
>
>
>
>
> __
> PLEASE READ: This email is confidential and may be privileged. It is
> intended for the named addressee(s) only and access to it by anyone else is
> unauthorised. If you are not an addressee, any disclosure or copying of the
> contents of this email or any action taken (or not taken) in reliance on it
> is unauthorised and may be unlawful. If you have received this email in
> error, please notify the sender or postmas...@vernalis.com. Email is not
> a secure method of communication and the Company cannot accept
> responsibility for the accuracy or completeness of this message or any
> attachment(s). Please check this email for virus infection for which the
> Company accepts no responsibility. If verification of this email is sought
> then please request a hard copy. Unless otherwise stated, any views or
> opinions presented are solely those of the author and do not represent
> those of the Company.
>
> The Vernalis Group of Companies
> 100 Berkshire Place
> Wharfedale Road
> Winnersh, Berkshire
> RG41 5RD, England
> Tel: +44 (0)118 938  <+44%20118%20938%20>
>
> To access trading company registration and address details, please go to
> the Vernalis website at www.vernalis.com and click on the "Company
> address and registration details" link at the bottom of the page..
> __
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Brian Kelley
I don't know the exact glide procedure, but I did write such a system for
OpenEye (POSIT).  The issue you are facing is that the RMSD portion is just
a constraint used for docking, it isn't used as the "score", in fact, it
can't tell if the conformation interpenetrates the active site or which
orientation is better.

I believe RDKit can generate conformations with a template, see
AllChem.ConstrainedEmbed, this would solve half of your problem in creating
conformations that match your template.  You still have the problem with
scoring against your active site.  POSIT scored against the shape tanimoto
of the active ligands (if any) to try to fill the same space as the known
ligands. See rdkit.Chem.rdShapeHelpers.ShapeTanimotoDist

This might not be what you want, but we had good success with similar
methods and virtual screening, especially when using multiple co-crystal
active sites.   I can send you a reference link if this interests you

Cheers,
 Brian

On Mon, Feb 20, 2017 at 12:17 PM, Thomas Evangelidis 
wrote:

> ​
> Greg and Brian,
>
> Thank you for your useful hints. All the compounds that I want to align
> are supposed to belong to the same analogue series so they should shave a
> common substructure with substantial size.
>
> What I want to emulate is the "core restrained docking" with glide, where
> you specify the common core of the query and the reference ligand using a
> SMARTS pattern and then glide docks the query compound to the binding
> pocket but takes care to overlay the core atoms of the query to the core
> atoms of the reference compound. Since RDKit does not do docking, I just
> generate 30 conformers of each query compound and select the best one by
> measuring the RMSD between the core of the query and the core of the
> reference after the alignment. Of course the conformations of the core
> atoms between the query and the reference are never identical hence the bad
> alignment. Is there any smarter way to emulate the "core restrained
> docking" with RDKit?
>
> I will provide you with more info soon (example sdf, results, etc.).
>
>
> ​
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Brian Kelley
I believe (Greg can correct me) to align the bemis-murcko scaffold, you
could

(1) extract the original atom pairs and send them to the RMSD algorithm
(2) take a bemis scaffold and convert it to a substructure query for use in
the RMSD algorithm.  In either case the RMSD is the rmsd of the scaffold
atoms, not the rest of the molecule.  Below is a little snippet that I
believes does this.

from rdkit.Chem import AllChem
from rdkit.Chem import rdqueries, rdMolAlign

mol = Chem.MolFromMolBlock( mol block )
print Chem.MolToSmiles(m)

murcko = AllChem.MurckoDecompose(mol)
print "murko",  Chem.MolToSmiles(murcko)

# bemis scaffolds match everything
q = rdqueries.AtomNumGreaterQueryAtom(0)
bemis = Chem.RWMol(murcko)
for atom in bemis.GetAtoms():
   bemis.ReplaceAtom(atom.GetIdx(), q)

rmsd = Chem.rdMolAlign.AlignMol( bemis, m )
print rmsd





On Mon, Feb 20, 2017 at 11:21 AM, Greg Landrum 
wrote:

> HI Thomas,
>
> To be sure we're talking about the same thing: rdMolAlign.GetO3A() is an
> implementation of the Open3DAlign algorithm. This is an unsupervised
> approach that uses a clever algorithm to come up with an atom-atom mapping
> between the two molecules you give it. It's not always going to pick the
> same atoms to align that you would.
>
> To answer the original question: if the two molecules you want to align do
> not share the same scaffold (or at least have a lot in common in the core
> of the molecule), it's unlikely that an MCS-based alignment is going to
> help.
>
> To answer your direct question here, the scaffold finding code in
> rdkit.Chem should be preserving coordinates. Here's a simple demonstration
> of that:
>
> In [3]: m =Chem.AddHs(Chem.MolFromSmiles('CC1CO1'))
>
> In [4]: AllChem.EmbedMolecule(m,AllChem.ETKDG())
> Out[4]: 0
>
> In [5]: nh = Chem.RemoveHs(m)
>
> In [6]: print(Chem.MolToMolBlock(nh))
>
>  RDKit  3D
>
>   4  4  0  0  0  0  0  0  0  0999 V2000
>-1.18500.2738   -0.1814 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0992   -0.4987   -0.2391 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.32290.12410.2290 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.6874   -0.75581.1165 O   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
>   2  3  1  0
>   3  4  1  0
>   4  2  1  0
> M  END
>
>
> In [7]: from rdkit.Chem import MurckoDecompose
>
> In [8]: scaff = Chem.MurckoDecompose(nh)
>
> In [10]: print(Chem.MolToMolBlock(scaff))
>
>  RDKit  3D
>
>   3  3  0  0  0  0  0  0  0  0999 V2000
> 0.0992   -0.4987   -0.2391 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.32290.12410.2290 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.6874   -0.75581.1165 O   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
>   2  3  1  0
>   3  1  1  0
> M  END
>
>
> You could help me provide a better answer here by providing a couple of
> example SDFs that you'd like to align, ideally together with a bit of
> sample code showing what you have tried that produces alignments you are
> unhappy with.
>
>
> -greg
>
>
>
>
> On Mon, Feb 20, 2017 at 1:54 PM, Thomas Evangelidis 
> wrote:
>
>> As a follow up question on this topic, I would like to ask if
>> MurckoScaffold.GetScaffoldForMol(mol) returns the scaffold of mol with
>> different coordinates?
>> I am asking this because when I use the transformation matrix of the
>> alignment of the cores of the probe and the reference molecules, in order
>> to align the whole probe to the reference molecule, the two molecules don't
>> seem to be aligned (they are in distance). Basically I do this:
>>
>> qcore = MurckoScaffold.GetScaffoldForMol(qmol)
>> refcore = MurckoScaffold.GetScaffoldForMol(refmol)
>> pyO3A = rdMolAlign.GetO3A(qcore, refcore, prbCid=qconfID, refCid=0,
>> reflect=True)
>> AllChem.TransformMol(qmol, bestRMSDTrans[1], confId=bestconfID,
>> keepConfs=False)
>>
>> and then I write the qmol in an sdf file. But when I visualize it the
>> qmol is far from the refmol!
>>
>>
>>
>>
>>
>> On 20 February 2017 at 02:33, Thomas Evangelidis 
>> wrote:
>>
>>> Dear all,
>>>
>>> I want to align 250 compounds that binding to the same pocket to one of
>>> the 9 available crystal ligands. I chose the reference ligand based on the
>>> Morgan2 similarity to the probe molecule. Then I align the 2 compounds
>>> using:
>>>
>>> pyO3A = rdMolAlign.GetO3A(qmol, refmol, prbCid=qconfID, refCid=0,
>>> reflect=True)
>>> RMSD = pyO3A.Align()
>>>
>>> ​and keep only the conformer of the probe with the lowest RMSD to the
>>> reference compound. However, the alignment looks terrible when I
>>> visualize it, so I would like to ask if there is any way to align the
>>> maximum common substructure only. I tried to align only the core of both
>>> molecules as defined by MurckoScaffold.GetScaffoldForMol(mol)​, but
>>> still the alignment looks bad. I have seen in the documentation how to find
>>> the maximum common substructure with rdFMCS.FindMCS but before I engage
>>> into 

Re: [Rdkit-discuss] RDKit "cannot create mol from SMILE" error

2017-01-18 Thread Brian Kelley
That doesn't look like a valid SMILES to me, I don't think a think a smiles 
string can start with a parenthesis ( branch ).


Brian Kelley

> On Jan 18, 2017, at 6:18 PM, Larson Danes <lgda...@gmail.com> wrote:
> 
> Hi all,
> 
> I'm using the following query in postgresql (with the rdkit extension 
> installed):
> 
> "select casrn from mols where m @> CAST(? AS mol)"
> 
> This returns "ERROR: could not create molecule from SMILES '...' " on 
> occasion. One such SMILE that causes this error regularly is 
> '(C=C1)[N+]([O-])=O'. I'm curious if there's documentation on this specific 
> error message anywhere. I've looked and haven't had luck finding any.
> Any information about this error message is much appreciated.
> 
> Thanks,
> 
> Larson
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Brian Kelley
In the inertial frame this is trivial, however, with the current RDKit
can't you just use the plane of best fit here for the planar/3D?  For a
linear molecule, you can use the PMI descriptors.

See PBF in RDKit

http://pubs.acs.org/doi/abs/10.1021/ci300293f

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:58 AM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> ​Great! I also notice confusing usage of moment of Inertia in those
> descriptors.
>
>
> For exemple in WHIM case, we need to know if the molecule is linear,
> planar or 3D in order to compute the descriptors.
>
>
> I did not find a easy way to determine this yet.
>
>
> BR,​
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <+41%2022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <+41%2079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
> --
> *De :* Brian Kelley <fustiga...@gmail.com>
> *Envoyé :* mardi 17 janvier 2017 13:44
> *À :* Chris Earnshaw
> *Cc :* Rdkit-discuss@lists.sourceforge.net; Greg Landrum
> *Objet :* Re: [Rdkit-discuss] PMI API
>
> I think we agree here.  Here I was talking about the raw Moment (M1z) not
> the moment of interia (MI1), I should have made the disctinction more
> explicit.  Moments are not necessarily Moments of inertia.  The terminology
> gets confusing.
>
> After a brief discussion with Greg, the Moments.py does the correct
> calculation which indirectly verifies MOE and the newer RDKit
> implementation.
>
> Cheers,
>  Brian
>
> On Tue, Jan 17, 2017 at 7:39 AM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>
>> The dimensions along one of the axes of a planar molecule in its inertial
>> frame will be zero, but the principal moments of inertia will all be
>> non-zero. The moment of inertia about an axis can only be zero if all the
>> atoms in the molecule are precisely aligned on that axis. That's only
>> possible for linear molecules. There's no way to draw a straight line axis
>> through all the atoms in a non-linear molecule, which would be a
>> requirement for the corresponding moment of inertia to be zero.
>>
>> Chris
>>
>> On 17 January 2017 at 12:29, Brian Kelley <fustiga...@gmail.com> wrote:
>>
>>> Looks like I'm late to the game.  I don't know about the PMI descriptors
>>> per-se, but if a planar molecule is in it's inertial frame, one of the axes
>>> should be zero (whether it is x, y or z) which means that the one of the
>>> M1x, M1y or M1z should be zero.
>>>
>>> We had some good experimentation with multipole expansion of moments
>>> (essentially based on the description of electrostatic multipoles) that
>>> might be nice to add to the PMI framework.
>>>
>>> Greg, I'm assuming that the Moments.py we opensourced a while back is
>>> similarly broken?  I'm attaching it here for posterity but it does appear
>>> to match the moe PMI's.
>>>
>>>
>>>
>>> On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw <cgearns...@gmail.com>
>>> wrote:
>>>
>>>> The new version looks good to me as far as I can test it. PMI and NPR
>>>> are still fine, the radius of gyration is right (for an extremely
>>>> artificial test system) and the asphericity index also seems right (despite
>>>> my best efforts to confuse things further - sorry about that!). Also
>>>> highlights even more confusion in the Todeschini article - the approximate
>>>> asphericity values for prolate and oblate molecules are reversed.
>>>>
>>>> The only (very trivial) thing I've spotted is the comment in the
>>>> inertialShapeFactor function. 'planar or no coordinates' should be 'linear
>>>> or no coordinates' to avoid confusion.
>>>>
>>>> Chris
>>>>
>>>> On 16 January 2017 at 09:30, Greg Landrum <greg.land...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
>>>>> ch...@cge-compchem.co.uk> wrote:
>>>>>
>>>>>>
>>>>>> Either way, it makes it rather hard to trust their derivations
>>>>>> generally - especially as there appear to be other errors (e.g. the
>>>>>> denominator in eq. 16 should be the square root of the given sum of
>>>>>> squares, according to

Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Brian Kelley
I think we agree here.  Here I was talking about the raw Moment (M1z) not
the moment of interia (MI1), I should have made the disctinction more
explicit.  Moments are not necessarily Moments of inertia.  The terminology
gets confusing.

After a brief discussion with Greg, the Moments.py does the correct
calculation which indirectly verifies MOE and the newer RDKit
implementation.

Cheers,
 Brian

On Tue, Jan 17, 2017 at 7:39 AM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> The dimensions along one of the axes of a planar molecule in its inertial
> frame will be zero, but the principal moments of inertia will all be
> non-zero. The moment of inertia about an axis can only be zero if all the
> atoms in the molecule are precisely aligned on that axis. That's only
> possible for linear molecules. There's no way to draw a straight line axis
> through all the atoms in a non-linear molecule, which would be a
> requirement for the corresponding moment of inertia to be zero.
>
> Chris
>
> On 17 January 2017 at 12:29, Brian Kelley <fustiga...@gmail.com> wrote:
>
>> Looks like I'm late to the game.  I don't know about the PMI descriptors
>> per-se, but if a planar molecule is in it's inertial frame, one of the axes
>> should be zero (whether it is x, y or z) which means that the one of the
>> M1x, M1y or M1z should be zero.
>>
>> We had some good experimentation with multipole expansion of moments
>> (essentially based on the description of electrostatic multipoles) that
>> might be nice to add to the PMI framework.
>>
>> Greg, I'm assuming that the Moments.py we opensourced a while back is
>> similarly broken?  I'm attaching it here for posterity but it does appear
>> to match the moe PMI's.
>>
>>
>>
>> On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw <cgearns...@gmail.com>
>> wrote:
>>
>>> The new version looks good to me as far as I can test it. PMI and NPR
>>> are still fine, the radius of gyration is right (for an extremely
>>> artificial test system) and the asphericity index also seems right (despite
>>> my best efforts to confuse things further - sorry about that!). Also
>>> highlights even more confusion in the Todeschini article - the approximate
>>> asphericity values for prolate and oblate molecules are reversed.
>>>
>>> The only (very trivial) thing I've spotted is the comment in the
>>> inertialShapeFactor function. 'planar or no coordinates' should be 'linear
>>> or no coordinates' to avoid confusion.
>>>
>>> Chris
>>>
>>> On 16 January 2017 at 09:30, Greg Landrum <greg.land...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
>>>> ch...@cge-compchem.co.uk> wrote:
>>>>
>>>>>
>>>>> Either way, it makes it rather hard to trust their derivations
>>>>> generally - especially as there appear to be other errors (e.g. the
>>>>> denominator in eq. 16 should be the square root of the given sum of
>>>>> squares, according to their reference).
>>>>>
>>>>
>>>> Indeed. Given the problems encountered, I went back and checked some
>>>> additional references to find definitions of the descriptors. The results
>>>> are in this PR, which I'd love feedback on if you have time to take a look:
>>>> https://github.com/rdkit/rdkit/pull/1265
>>>>
>>>> I didn't manage to find any information about "inertial shape factor"
>>>> and don't have access to the references cited in the Todeschini paper, but
>>>> I think the others are now reasonably reliable.
>>>>
>>>> -greg
>>>>
>>>>
>>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-17 Thread Brian Kelley
Looks like I'm late to the game.  I don't know about the PMI descriptors
per-se, but if a planar molecule is in it's inertial frame, one of the axes
should be zero (whether it is x, y or z) which means that the one of the
M1x, M1y or M1z should be zero.

We had some good experimentation with multipole expansion of moments
(essentially based on the description of electrostatic multipoles) that
might be nice to add to the PMI framework.

Greg, I'm assuming that the Moments.py we opensourced a while back is
similarly broken?  I'm attaching it here for posterity but it does appear
to match the moe PMI's.



On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw 
wrote:

> The new version looks good to me as far as I can test it. PMI and NPR are
> still fine, the radius of gyration is right (for an extremely artificial
> test system) and the asphericity index also seems right (despite my best
> efforts to confuse things further - sorry about that!). Also highlights
> even more confusion in the Todeschini article - the approximate asphericity
> values for prolate and oblate molecules are reversed.
>
> The only (very trivial) thing I've spotted is the comment in the
> inertialShapeFactor function. 'planar or no coordinates' should be 'linear
> or no coordinates' to avoid confusion.
>
> Chris
>
> On 16 January 2017 at 09:30, Greg Landrum  wrote:
>
>>
>>
>> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <
>> ch...@cge-compchem.co.uk> wrote:
>>
>>>
>>> Either way, it makes it rather hard to trust their derivations generally
>>> - especially as there appear to be other errors (e.g. the denominator in
>>> eq. 16 should be the square root of the given sum of squares, according to
>>> their reference).
>>>
>>
>> Indeed. Given the problems encountered, I went back and checked some
>> additional references to find definitions of the descriptors. The results
>> are in this PR, which I'd love feedback on if you have time to take a look:
>> https://github.com/rdkit/rdkit/pull/1265
>>
>> I didn't manage to find any information about "inertial shape factor" and
>> don't have access to the references cited in the Todeschini paper, but I
>> think the others are now reasonably reliable.
>>
>> -greg
>>
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


Moments.py
Description: Binary data
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Missing Properties for Mol in Multiprocessing Pool

2017-01-13 Thread Brian Kelley
By default, normal molecules don't pickle properties.  The pickling is used to 
transfer mols in Python multiprocessing.

Wrapping them in a PropertyMol should solve the issue:

http://www.rdkit.org/Python_Docs/rdkit.Chem.PropertyMol.PropertyMol-class.html


Brian Kelley

> On Jan 13, 2017, at 6:39 PM, Paul Novick <paul.nov...@gmail.com> wrote:
> 
> Hello All
> 
> I am getting strange behaviour for mols passed into multiprocessing Pools.  I 
> am finding that all of the SD properties for the mol seem to disappear within 
> the worker process.  In the following, I am attempting to retrieve the 
> 'ChemDiv_IDNUMBER' property from a series of mols.  When doing this is in 
> loop outside of a worker process, the value is retrieved as expected.  
> However, within the worker, the property does not exist.  
> 
> compFile = Chem.SDMolSupplier('mols.sdf')
> iterator = []
> 
> for i in range(5):
> iterator.append(compFile[i])
> print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
> print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> def lookupForI(mol):
> thisresult = [0,0,0,0,0,0]
> print(mol.GetNumHeavyAtoms(), 'atoms in worker')
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> return mol.GetNumHeavyAtoms()
> 
> pool = Pool(3)
> result=pool.map(lookupForI, iterator)
> pool.close()
> pool.join()
> for ares in result:
> print(ares)
> 
> 
> gives the following
> 20 atoms in loop
> 000L-0408 is ID in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 20 atoms in worker
> 18 atoms in worker
> 18 atoms in worker
> 26 atoms in worker
> 18 atoms in worker
> ---
> RemoteTraceback   Traceback (most recent call last)
> RemoteTraceback: 
> """
> Traceback (most recent call last):
>   File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", 
> line 119, in worker
> result = (True, func(*args, **kwds))
>   File "/opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py", 
> line 44, in mapstar
> return list(map(*args))
>   File "", line 16, in lookupForI
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> KeyError: 'ChemDiv_IDNUMBER'
> """
> 
> The above exception was the direct cause of the following exception:
> 
> KeyError  Traceback (most recent call last)
>  in ()
>  35 
>  36 pool = Pool(3)
> ---> 37 result=pool.map(lookupForI, iterator)
>  38 pool.close()
>  39 pool.join()
> 
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in 
> map(self, func, iterable, chunksize)
> 258 in a list that is returned.
> 259 '''
> --> 260 return self._map_async(func, iterable, mapstar, 
> chunksize).get()
> 261 
> 262 def starmap(self, func, iterable, chunksize=None):
> 
> /opt/apps/Anaconda2/envs/py35/lib/python3.5/multiprocessing/pool.py in 
> get(self, timeout)
> 606 return self._value
> 607 else:
> --> 608 raise self._value
> 609 
> 610 def _set(self, i, obj):
> 
> KeyError: 'ChemDiv_IDNUMBER'
> 
> 
> 
> And, when looking to see if any properties are associated with the mol using 
> GetPropNames, I find no properties in the worker process, but all of the 
> properties exist within the loop.
> 
>  iterator = []
> 
> for i in range(5):
> iterator.append(compFile[i])
> print(len([x for x in compFile[i].GetPropNames()]), 'properties in loop')
> print(compFile[i].GetNumHeavyAtoms(), 'atoms in loop')
> print(compFile[i].GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> def lookupForI(mol):
> thisresult = [0,0,0,0,0,0]
> print(len([x for x in mol.GetPropNames()]), 'properties in worker')
> print(mol.GetNumHeavyAtoms(), 'atoms in worker')
> print(mol.GetProp('ChemDiv_IDNUMBER'), 'is ID in loop')
> 
> return mol.GetNumHeavyAtoms()
> ...
> 
> gives
> 
> 
> 76 properties in loop
> 20 atoms in loop
> 000L-0408 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1176 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-1268 is ID in loop
> 76 properties in loop
> 26 atoms in loop
> 000L-2413 is ID in loop
> 76 properties in loop
> 18 atoms in loop
> 000L-5632 is ID in loop
> 0 pr

Re: [Rdkit-discuss] UpdatePropertyCache() after RunReactants

2017-01-12 Thread Brian Kelley
The outputs of reaction are a bit confusing.

Reactions can have multiple product templates so the output of RunReactants is 
a list of list of molecules.

For products in result:
  For molecule in products:
 Molecule.UpdatePropertyCache()

However, it looks like your reaction is generating non sensical molecules so 
you may want to draw with sanitizaton turned off so you can see the reaction 
output.


Brian Kelley

> On Jan 11, 2017, at 9:11 PM, Curt Fischer <curt.r.fisc...@gmail.com> wrote:
> 
> Hi all,
> 
> I recently wanted to use RDKit to model the famous copper-catalyzed 
> cycloaddition of alkynes and azides.
> 
> I eventually got things working, kind of, but had two questions.  First, I 
> was surprised to find that the products of RunReactants don't have update 
> property caches.  Is this something I should have expected, or is it a bug?  
> If the latter, is it any easy-to-fix bug or a hard-to-fix one?
> 
> Second, how can I modify my SMARTS reaction query to avoid duplication of 
> each product?
> 
> Here's some example code, also available at 
> https://github.com/tentrillion/ipython_notebooks/blob/master/rdkit_smarts_reactions_needs_updating.ipynb
> 
> # ---BEGIN CODE-- #
> # import rdkit components
> from rdkit import rdBase
> from rdkit import Chem
> from rdkit.Chem import AllChem
> from rdkit.Chem import Draw
> 
> # use IPythonConsole for pretty drawings
> from rdkit.Chem.Draw import IPythonConsole
> # IPythonConsole.ipython_useSVG=True  # leave out for github
> 
> # for flattening
> from itertools import chain
> 
> # define reactants
> diyne_smiles = 'C#CCC(O)C#C'
> azide_smiles = 'CCCN=[N+]=[N-]'
> 
> diyne = Chem.MolFromSmiles(diyne_smiles)
> azide = Chem.MolFromSmiles(azide_smiles)
> 
> # define reaction
> copper_click_smarts = 
> '[C:1]#[C:2].[N:3]=[N+:4]=[N-:5]>>[c:1]1[c:2][n-0:3][n-0:4][n-0:5]1'
> copper_click = AllChem.ReactionFromSmarts(copper_click_smarts)
> 
> # run reaction
> products_tuples = copper_click.RunReactants((diyne, azide))
> 
> # flatten product tuple of tuples into list
> products = list(chain(*products_tuples))
> 
> # FAILS: mol property caches are not updated
> try:
> Draw.MolsToGridImage(products)
> except (RuntimeError, ValueError) as e:
> print 'FAILED!'
> my_error = e
> 
> # this works: force updating 
> for product in products:
> product.UpdatePropertyCache()
> 
> Draw.MolsToGridImage(products)
> 
> my_error
> 
> products_tuples = copper_click.RunReactants((diyne, azide))
> products = list(chain(*products_tuples))
> # FAILS: mol property caches are not updated
> Draw.MolsToGridImage(products)
> 
> # ---END CODE-- #
> 
> The stacktrace is:
> 
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
>   2 products = list(chain(*products_tuples))
>   3 # FAILS: mol property caches are not updated
> > 4 Draw.MolsToGridImage(products)
> 
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/IPythonConsole.pyc
>  in ShowMols(mols, **kwargs)
> 198   else:
> 199 fn = Draw.MolsToGridImage
> --> 200   res = fn(mols, **kwargs)
> 201   if kwargs['useSVG']:
> 202 return SVG(res)
> 
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>  in MolsToGridImage(mols, molsPerRow, subImgSize, legends, 
> highlightAtomLists, useSVG, **kwargs)
> 403   else:
> 404 return _MolsToGridImage(mols, molsPerRow=molsPerRow, 
> subImgSize=subImgSize, legends=legends,
> --> 405 highlightAtomLists=highlightAtomLists, 
> **kwargs)
> 406 
> 407 
> 
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>  in _MolsToGridImage(mols, molsPerRow, subImgSize, legends, 
> highlightAtomLists, **kwargs)
> 344   highlights = highlightAtomLists[i]
> 345 if mol is not None:
> --> 346   img = _moltoimg(mol, subImgSize, highlights, legends[i], 
> **kwargs)
> 347   res.paste(img, (col * subImgSize[0], row * subImgSize[1]))
> 348   return res
> 
> /Users/curt/anaconda2/lib/python2.7/site-packages/rdkit/Chem/Draw/__init__.pyc
>  in _moltoimg(mol, sz, highlights, legend, **kwargs)
> 309   from rdkit.Chem.Draw import rdMolDraw2D
> 310   if not hasattr(rdMolDraw2D, 'MolDraw2DCairo'):
> --> 311 img = MolToImage(mol, sz, legend=legend, 
> highlightAtoms=highlights, **kwargs)
> 312   else:
> 313 nmol = rdMolDraw2D.PrepareMolForDrawing(mol, 
> kekulize=kwargs.get('kekulize', True))
&

Re: [Rdkit-discuss] SubstructureMatch

2017-01-09 Thread Brian Kelley
I believe the RDKit uses the vf2 algorithm:
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 
An improved algorithm for matching large graphs. 
In: 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern 
Recognition, pp. 149-159, Cuen, 2001.


If you google search for vf2 substructure search you will find plenty of 
benchmarks


Brian Kelley

> On Jan 9, 2017, at 4:49 AM, Axel Rudling <axru6...@gmail.com> wrote:
> 
> Hello,
> 
> I've been benchmarking different softwares for substructure searching of 
> molecules. From what I can tell, RDKits substructureMatch appears to do the 
> job. I have been looking around the documentation and this forum for a 
> description of how it actually works. As I understand it, it does the 
> calculation from a molecule object that is created from the smile?
> 
> How does substructureMatch work? Is it still fingerprint based? How accurate 
> is it? Is there a description of some standardised term for what is happening 
> there?
> 
> Best,
> Axel
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PMI API

2017-01-08 Thread Brian Kelley
I think the relevant issue is that if you are using an existing build, we don't 
yet have the capability for you to know what was built and what was not.  I.e. 
You need to add the compiler flag to indicate that the 3D stuff was actually 
built.  

I had a PR to fix this a while ago that was postponed that we should probably 
resurrect.  Basically it is an rdkit.h header file that has these flags built 
in so you won't have to include them yourself.


Brian Kelley

> On Jan 8, 2017, at 11:31 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> Hi Chris,
> 
> The RDKit should automatically build with the new descriptors enabled if 
> eigen3 can be found when cmake is run. When you run cmake you should see a 
> message if/when the build is disabled.
> 
> If you want to call the functions, the best documentation available is the 
> standard C++ API documentation, but something seems to have gone wrong when I 
> ran doxygen. I'll look into this. That documentation is generated from the 
> header file, so you can just look there:
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Descriptors/PMI.h
> not that there's a huge amount of documentation available.
> 
> W.r.t. efficiency: you do need to call the functions individually, but the 
> expensive calculation of the moments will only be done once, so it doesn't 
> end up doing repeated work.
> 
> And, finally, on the values themselves: I will have to take a look at that. 
> -greg
> 
> 
> 
> 
>> On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw <cgearns...@gmail.com> wrote:
>> Hi
>> 
>> A while ago I had a project which needed PMI descriptors (specifically NPR1 
>> and NPR2) which were not available in the main branch of RDKit at the time. 
>> At the time I used the fork by 'hahnda6' which provided the 
>> calcPMIDescriptors() function, and this worked well. Now that PMI 
>> descriptors are available in the main RDKit distrubution I thought I'd 
>> rewrite my code to use the official version.
>> 
>> Building the new RDKit was no problem, but things went downhill shortly 
>> after that. There's every chance that I've missed the relevant documentation 
>> (I hope someone can point me in the right direction if so) and done 
>> something stupid!
>> 
>> The issues are -
>> 1) I can't find any documentation of the C++ API - the only reference to PMI 
>> in the online RDKit documentation appears to be to the PMI.h file
>> 2) Having written a program using the PMI[123] and/or NPR[12] functions, I 
>> couldn't get it to compile until I added the  -DRDK_BUILD_DESCRIPTORS3D 
>> directive -
>> g++ -o sdf_pmi_blob sdf_pmi.cpp -I/packages/rdkit/include/rdkit 
>> -L/packages/rdkit/lib -lDescriptors -lGraphMol -lFileParsers -Wno-deprecated 
>> -O2 -DRDK_BUILD_DESCRIPTORS3D
>> This seems a bit odd...
>> 3) Is it necessary to make separate calls to the individual PMI() and/or 
>> NPR() functions? Surely this results in duplication of some of the heavier 
>> calculations? I can't find any equivalent of calcPMIDescriptors() which 
>> returned a 'Moments' struct containing all the PMI and NPR values in one go.
>> 4) The big one! The returned results look very odd. They appear to relate 
>> more to the dimensions of the molecule than the moments of inertia. For a 
>> rod-like molecule (dimethylacetylene) I'd expect two large and one small PMI 
>> (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828  NPR2: 
>> 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:  25.3699  
>> NPR1: 0.002430  NPR2: 0.002430.
>> For disk-like (benzene) the result should be one large and two medium (e.g. 
>> PMI1: 89.1448  PMI2: 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2: 0.500013) 
>> but get PMI1: 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1: 2.14213e-11  
>> NPR2: 0.33.
>> Finally for a roughly spherical molecule (neopentane) the NPR values look 
>> reasonable (no great surprise) but the absolute PMI values may be too small: 
>> old program - PMI1: 114.795  PMI2: 114.797  PMI3: 114.799
>> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2: 6.59488  
>> PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
>> 
>> As I say, it's entirely likely that I'm doing something stupid here so any 
>> pointers will be gratefully received. FWIW, the core of my program is -
>> mol = MolBlockToMol(ctab, true, false);
>> double pmi1 = RDKit::Descriptors::PMI1(*mol);
>> double pmi2 = RDKit::Descriptors::PMI2(*mol);
>> double pmi3 = RDKit::Descriptors::PMI3(*mol);
>> double npr1 = RDKit::Descriptors::NPR1(*mol);
>> double npr2 = RDKit::Descriptors::NPR2(*mol);

Re: [Rdkit-discuss] It's not possible to build the RDKit with Visual Studio 2015 Update 3

2017-01-04 Thread Brian Kelley
Apparently there is a work around:

I ran into this bug as well. I ended up forward declaring the template function 
for the classes it affected (there weren't many in our code).

e.g. for a class named BisectLine:

namespace boost
{
template<> const volatile BisectLine* get_pointer(const volatile BisectLine* p) 
{ return p; }
}

Ug.  Perhaps this can be automated...





Brian Kelley

> On Jan 4, 2017, at 2:53 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> I'm not sure how many of you this will be relevant to, but it's important to 
> know that Update 3 of Microsoft's Visual Studio 2015 is not able to build the 
> RDKit Python wrappers. This is due to a compiler bug that ends up affecting 
> boost::python. The bug report for that is here:
> https://connect.microsoft.com/VisualStudio/Feedback/Details/2852624
> It has been fixed, so if there ever is a VS2015 Update 4 I would expect it to 
> work, but in the meantime VS2015 Update 2 should still work (I am going to 
> test that today), but it's not trivial to install an older version. 
> Instructions are here:
> https://msdn.microsoft.com/en-us/library/mt653628.aspx
> 
> -greg
> 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Brian Kelley
Perhaps we could train a ML algorithm to know which algorithm to use when :)

Cheers,
 Brian

On Thu, Dec 29, 2016 at 8:19 AM, John M  wrote:

> Hi Peter,
>
> I uploaded the benchmark set here: https://github.com/
> johnmay/layout-benchmark and have tested on their web service a few weeks
> ago. IIRC it did seem quite slow, maybe fine for ahead of time generation
> but not usable for on demand depiction. It does produce very nice
> depictions but I think the right way to go is described by Alex Clark (2006
> I think?) and used by MOE. Essentially use optimisation for certain
> parts/classes of structure but not everything.
>
> Unfortunately no comparison to MOE/ChemDraw in the paper.
>
> For why you need sub-second depiction consider these times for 92877507
> structures (current size PubChem Compound):
>
> 1s per structure = 1074 days (~3 years)
> 100 ms per structure = 107 days
> 1ms per structure = 25 hours
>
> John
>
> On 15 December 2016 at 23:12, Peter S. Shenkin  wrote:
>
>> Yes, of course, storing the images is an alternative.
>>
>> -P.
>>
>> On Thu, Dec 15, 2016 at 5:46 PM, Dimitri Maziuk 
>> wrote:
>>
>>> On 12/15/2016 04:23 PM, Peter S. Shenkin wrote:
>>>
>>> > Obviously, it doesn't matter if you're rendering just few structures,
>>> but
>>> > in a scenario where you might be downloading a hundred SMILES from a
>>> DB and
>>> > displaying them on a grid in a browser, computing the 2D depictions on
>>> the
>>> > fly, waiting 5 sec for a page refresh wouldn't be great.
>>>
>>> Maybe not, but depending how the browser lays out the grid, it may take
>>> 5 seconds anyway.
>>>
>>> My recommendation for that use case would be to pre-generate the images
>>> and store the URLs in that database. Which is what we do here.
>>>
>>> ;)
>>> --
>>> Dimitri Maziuk
>>> Programmer/sysadmin
>>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Bug in AllChem.EmbedMultipleConfs pruning?

2016-12-22 Thread Brian Kelley
Missed the swap == swap with same type.

There probably is some moment based heuristic you use to check for bad outliers.


Brian Kelley

> On Dec 22, 2016, at 11:49 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> 
>> On Thu, Dec 22, 2016 at 5:37 PM, Brian Cole <col...@gmail.com> wrote:
>> RMSD with auto-morph symmetries with hydrogens are crazy expensive to 
>> calculate. Symmetry should be on by default, but without hydrogens. Would 
>> even love to see the RMSD auto-morph symmetry code ignore trifluro type of 
>> groups too as they dramatically increase the cost of the computation with 
>> little added value.  
> 
> Ignoring the Hs with "getBestRMS" is certainly a must. The CF3s are also a 
> good idea.
> Maybe it would make sense to have an option to ignore isomorphisms that only 
> differ by swapping degree 1 atoms.
> 
> -greg
>  
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/intel
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Brian Kelley
I would vote for make a more obvious way to get to these values.  I have
had the need to do this when working with external depictors (i.e. mol ->
smiles -> depict with atom highlighting is one use case)  I just couldn't
think of a valid API way of doing this.  Attaching these values to the
molecule seems like it isn't really the right solution considering there
are two forms of canonical ordering if isomerisms are considered.  I had
thought about making a CanonicalAtomOrder function that does this as well,
or perhaps making a MolToSmiles variant.

Any other ideas?

On Mon, Dec 19, 2016 at 3:58 AM, Greg Landrum 
wrote:

>
> On Mon, Dec 19, 2016 at 9:43 AM, Maciek Wójcikowski  > wrote:
>
>>
>> There is also CanonicalRankAtoms [http://www.rdkit.org/Python_D
>> ocs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to
>> be forgotten.
>>
>
> One thing to be aware of here is that this provides the canonical ranking
> of atoms that is used for the SMILES generation, but the values are not
> equal to the actual output order of the atoms.
> Here's an example of that:
> In [3]: m = Chem.MolFromSmiles('CC(O)CCN')
>
> In [4]: list(Chem.CanonicalRankAtoms(m))
> Out[4]: [0, 5, 2, 4, 3, 1]
>
> In [5]: Chem.MolToSmiles(m)
> Out[5]: 'CC(O)CCN'
>
> In [7]: m.GetProp('_smilesAtomOutputOrder')
> Out[7]: '[0,1,2,3,4,5,]'
>
> so though atom 1 is ranked in position 5, it ends up being the second atom
> output since it is connected to atom 0, which happens to have rank 0.
>
> -greg
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-18 Thread Brian Kelley
Jean-Marc,
  This is very non-obvious, but here is how you can do it from python:

>>> from rdkit import Chem

>>> m = Chem.MolFromSmiles("NCCC")

>>> Chem.MolToSmiles(m)

'CCCN'

>>> m.GetProp("_smilesAtomOutputOrder")

'[3,2,1,0,]'



Note that this returns the list as a string which is sub-optimal.
GetPropsAsDict will convert these to proper python objects, however, this
is considered a private member so you need to return these as well:

>>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"])

[3, 2, 1, 0]


I'm converting to a list here to show the output, this is really a wrapped
vector but it can be used as a sequence.  Hope this helps.  Note that you
can just dump out the dictionary for any object with SetProp:

>>> m.GetPropsAsDict(True,True)

{'_smilesAtomOutputOrder': ,
'numArom': 0, '_StereochemDone': 1, '__computedProps':
}

And see some of how the sausage is made inside.

Cheers,
 Brian

On Sun, Dec 18, 2016 at 12:19 PM, Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Hi all,
>
> maybe my question has been already been answered:
> when converting from Mol to a canonical SMILES string,
> is there a way to obtain the mapping between the atom indexes in the
> Mol object and the atom indexes in the SMILES chain?
>
> All the best,
>
> Jean-Marc
>
> --
>
> Dr. Jean-Marc Nuzillard
> Institute of Molecular Chemistry
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 33 3 26 91 82 10
> Fax :33 3 26 91 31 66
> http://www.univ-reims.fr/ICMR
>
> http://eos.univ-reims.fr/LSD/
> http://eos.univ-reims.fr/LSD/JmnSoft/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Incorrect results for substructure search obtained with Tversky similarity.

2016-12-12 Thread Brian Kelley
I'm not really sure what you mean by tversky searching in substructure mode.

Fingerprinting methods do not guarantee the presence of an exact substructure.  
You can think of tversky asking what percentage of me is in you and that 
percentage doesn't have to be a substructure.  However they are correlated in 
that a good screening fingerprint can throw out molecules that will never be a 
substructure match.  You still have to check the substructure match however.

Using a screen fingerprint to filter out true negatives, I generally go from 
5-10k substructure matches/sec to around 500-600k/sec in real world searches.  
I'm happy to provide an example of this if you need it.

I hope this helps.


Brian Kelley

> On Dec 12, 2016, at 11:29 AM, Axel Rudling <axru6...@gmail.com> wrote:
> 
> Hello all,
> 
> Currently I'm doing a project with Tversky searching in substructure mode and 
> use smiles for creating fingerprints.
> 
> For most molecules I get the correct result but there are some molecules 
> where I get an overflow of falsely predicted substructure molecules. In 
> brief, I get a large amount of compounds as a result from the substructure 
> search that are not actually substructures of the query compound. I'm not 
> certain of why but it might have to do with the FP representation as these 
> molecules have a very unusual curricular structure ex.:
> 
> C1C[NH2+]CCC[NH2+]CCCNCCC[NH2+]C1
> 
> 
> I use 2048-bit ECFP4 fingerprints.
> 
> tverskySim = DataStructs.TverskySimilarity(ffp1,ffp2,1.0,0.0)
> 
> Does anyone have an idea?
> 
> 
> 
> best
> 
> Axel
> 
> 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Generating all stereochem possibilities from smile

2016-12-09 Thread Brian Kelley
bond.GetStereo

>>> from rdkit import Chem

>>> m = Chem.MolFromSmiles("F/C=C/F")

>>> for bond in m.GetBonds():

...print bond.GetStereo()

...

STEREONONE

STEREOE

STEREONONE

However, setting bond stereo doesn't appear to be exposed.

I suppose you could permute the smiles strings \ => / but that seems to not
be the right solution.

Cheers,
 Brian

On Fri, Dec 9, 2016 at 3:20 PM, James Johnson 
wrote:

> Thank you all for your responses. I got R/S combos to work (awesome!), but
> now I am working on E/Z. I am considering a similar approach (0 would be
> interpreted as E, 1 as Z).
>
> However to do this I would need two functions which I couldn't find
> 1.) Find double bonds that could be E/Z.
> 2.) Set the double bonds to have property E or Z
>
> I actually only need #2 (just make any double bond E/Z as see what
> happens), but #1 would make it more efficient.
>
> How does RDKit handle E/Z stereochem in mol?
>
> Thanks again for the help!
>
> - James
>
> On Fri, Dec 9, 2016 at 8:23 AM, Rafal Roszak  wrote:
>
>> On Thu, 8 Dec 2016 23:21:24 -0800
>> James Johnson  wrote:
>>
>> > Hello all, I am trying to generate R and S from: CCC(C)(Cl)Br
>> >
>> > Below is the code for making the smi to mol file. Can someone give me
>> some
>> > guidance to generate all sterochem possibilities?
>>
>> There are two ways to address this problem:
>> 1. generate all possible SMILESs and for each of them generate 3D
>> structure using molecular modeling (or just embedMolecule) which is in
>> my opinion better
>>
>> 2. generate many 3D structures and extract interesting stereoisomers
>>
>> Links below describe the 1st option:
>> https://github.com/rdkit/rdkit/issues/626
>> https://sourceforge.net/p/rdkit/mailman/message/34488969/
>>
>>
>> Regards,
>> RR
>>
>
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-03 Thread Brian Kelley
Note:  I turned logging off, otherwise a lot of time was spent spewing to
stderr:

from rdkit import Chem, rdBase

rdBase.DisableLog("rdApp.*")

On Sat, Dec 3, 2016 at 9:02 AM, Brian Kelley <fustiga...@gmail.com> wrote:

> Here are some number from my laptop for parsing:
>
> Normal Smiles parser:
> =
> Proper Smiles 11K/s
> Non Smiles words: 94K/s
>
> Don't make molecules (n.b. accepts some 'bad' smiles like C1CCC3)
> =
> Proper Smiles:  110K/s
> Non Smiles words: 130K/s
>
>
> If I had to pick, I would just use the normal MolFromSmiles, if you don't
> expect many actual smiles strings in your corpus, it's plenty fast.
>
> Cheers,
>  Brian
>
>
> On Fri, Dec 2, 2016 at 5:08 PM, Andrew Dalke <da...@dalkescientific.com>
> wrote:
>
>> On Dec 2, 2016, at 10:05 PM, Brian Kelley wrote:
>> > Here is a very old version of Andrew's parser in code form: ... It was
>> fairy well tested on the sigma catalog back in the day.  It might be fun to
>> resurrect use it in some form.
>>
>> There's also my OpenSMILES parser written for Ragel:
>>
>>   https://bitbucket.org/dalke/opensmiles-ragel
>>
>> Taking that path goes more along the lines of what NextMove has done.
>>
>> BTW, upon consideration,
>>
>> >>   [^]]*   # ignore anything up to the ']'
>>
>> should be more restrictive and exclude '[', ' ', newline ... or really,
>> only allow those characters which are valid after the element (+, -, 0-9,
>> @, :, T, H, and a few others).
>>
>> The exercise is left for the students. ;)
>>
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-03 Thread Brian Kelley
Here are some number from my laptop for parsing:

Normal Smiles parser:
=
Proper Smiles 11K/s
Non Smiles words: 94K/s

Don't make molecules (n.b. accepts some 'bad' smiles like C1CCC3)
=
Proper Smiles:  110K/s
Non Smiles words: 130K/s


If I had to pick, I would just use the normal MolFromSmiles, if you don't
expect many actual smiles strings in your corpus, it's plenty fast.

Cheers,
 Brian


On Fri, Dec 2, 2016 at 5:08 PM, Andrew Dalke <da...@dalkescientific.com>
wrote:

> On Dec 2, 2016, at 10:05 PM, Brian Kelley wrote:
> > Here is a very old version of Andrew's parser in code form: ... It was
> fairy well tested on the sigma catalog back in the day.  It might be fun to
> resurrect use it in some form.
>
> There's also my OpenSMILES parser written for Ragel:
>
>   https://bitbucket.org/dalke/opensmiles-ragel
>
> Taking that path goes more along the lines of what NextMove has done.
>
> BTW, upon consideration,
>
> >>   [^]]*   # ignore anything up to the ']'
>
> should be more restrictive and exclude '[', ' ', newline ... or really,
> only allow those characters which are valid after the element (+, -, 0-9,
> @, :, T, H, and a few others).
>
> The exercise is left for the students. ;)
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Brian Kelley
Here is a very old version of Andrew's parser in code form:

http://frowns.cvs.sourceforge.net/viewvc/frowns/frowns/smiles_parsers/Smiles.py?revision=1.1.1.1=text%2Fplain

that I used in frowns more than a decade ago.  It was fairy well tested on the 
sigma catalog back in the day.  It might be fun to resurrect use it in some 
form.


Brian Kelley

> On Dec 2, 2016, at 2:36 PM, Andrew Dalke <da...@dalkescientific.com> wrote:
> 
>> On Dec 2, 2016, at 11:11 AM, Greg Landrum wrote:
>> An initial start on some regexps that match SMILES is here: 
>> https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b38da34e22fb
>> 
>> that may also be useful
> 
> 
> I've put together a more gnarly regular expression to find possible SMILES 
> strings. It's configured for at least 4 atom terms, but that's easy to change 
> (there's a "{3,}" which can be changed as desired.)
> 
> It's follows the SMILES specification a bit more closely, which means there 
> should be fewer false positives than the regular expression Greg pointed out.
> 
> The file which constructs the regular expression, and an example driver, is 
> attached. Here's what the output looks like:
> 
> 
> 
> 
> % python detect_smiles.py ~/talks/*.txt
> /Users/dalke/talks/ICCS_2014_paper.txt:528:532 'IOPS'
> /Users/dalke/talks/ICCS_2014_paper.txt:30150:30183 
> 'CC12CCC3C(CCC4=CC(O)CCC34C)C1CCC2'
> /Users/dalke/talks/ICCS_2014_paper2.txt:3270:3274 'CBCC'
> /Users/dalke/talks/ICCS_2014_paper2.txt:10229:10239 'CC(=O)[O-]'
> /Users/dalke/talks/ICCS_2014_paper2.txt:32766:32770 'ISIS'
> /Users/dalke/talks/Sheffield2013.txt:25002:25013 'C1=CC=CC=C1'
> /Users/dalke/talks/Sheffield2013.txt:25039:25047 'c1c1'
> /Users/dalke/talks/Sheffield_2016.txt:2767:2771 'CBCC'
> /Users/dalke/talks/Sheffield_2016.txt:10295:10301 'O0'
> /Users/dalke/talks/Sheffield_2016_talk.txt:7302:7306 'CBCC'
> /Users/dalke/talks/Sheffield_2016_talk.txt:7564:7568 'CBCC'
> /Users/dalke/talks/Sheffield_2016_talk.txt:7716:7720 'CBCC'
> /Users/dalke/talks/Sheffield_2016_v2.txt:2874:2878 'soon'
> /Users/dalke/talks/Sheffield_2016_v2.txt:7312:7317 'O'
> /Users/dalke/talks/Sheffield_2016_v2.txt:22770:22774 'ICCS'
> /Users/dalke/talks/Sheffield_2016_v3.txt:2982:2986 'soon'
> /Users/dalke/talks/Sheffield_2016_v3.txt:7627:7632 'O'
> /Users/dalke/talks/Sheffield_2016_v3.txt:24546:24550 'ICCS'
> /Users/dalke/talks/tdd_part_2.txt:7547:7551 'scop'
> 
> You can also modify the code for line-by-line processing rather than an 
> entire block of text like I did.
> 
> 
> As others have pointed out, this is a well-trodden path. Follow their 
> warnings and advice.
> 
> Also, I didn't fully test it.
> 
> 
> 
>Andrew
>da...@dalkescientific.com
> 
> 
> P.S.
> 
> Here's the regular expression:
> 
> (? term
> 
> (
> 
> (
> (
> Cl? | # Cl and Br are part of the organic subset
> Br? |
> [NOSPFIbcnosp*] |  # as are these single-letter elements
> 
> # bracket atom
> \[\d*  # optional atomic mass
>   (# valid element names
>C[laroudsemf]? |
>Os?|N[eaibdpos]? |
>S[icernbmg]? |
>P[drmtboau]? |
>H[eofgas]? |
>c|n|o|s|p |
>A[lrsgutcm] |
>B[eraik]? |
>Dy|E[urs] |
>F[erm]? |
>G[aed] |
>I[nr]? |
>Kr? |
>L[iaur] |
>M[gnodt] |
>R[buhenaf] |
>T[icebmalh] |
>U|V|W|Xe |
>Yb?|Z[nr]
>   )
>   [^]]*   # ignore anything up to the ']'
> \]
> )
>   # allow 0 or more closures directly after any atom
> (
>  [-=#$/\\]?  # optional bond type
>  (
>[0-9] |# single digit closure
>(%[0-9][0-9])  # two digit closure
>  )
> ) *
> )
> 
> (
> 
> (
> (
>  \( [-=#$/\\]?   # a '(', which can have an optional bond (no dot)
> ) | (
>   \)*   # any number of close parens, followed by
>   (
> ( \( [-=#$/\\]? ) |  # an open parens and optional bond (no dot)
> [.-=#$/\\]?  # or a dot disconnect or bond
>   )
> )
> )
> ?
> 
> (
> (
> Cl? | # Cl and Br are part of the organic subset
> Br? |
> [NOSPFIbcnosp*] |  # as are these single-letter elements
> 
> # bracket atom
> \[\d*  # optional atomic mass
>   (# valid element names
>C[laroudsemf]? |
>Os?|N[eaibdpos]? |
>S[icernbmg]? |
>P[drmtboau]? |
>H[eofgas]? |
>c|n|o|s|p |
>A[lrsgutcm] |
>B[eraik]? |
>Dy|E[urs] |
>F[erm]? |
>G[aed] |
>I[nr]? |
>Kr? |
>L[

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Brian Kelley
George,
  My point was actually parsing the words as IUPAC/SMILES is surprisingly 
effective as opposed to an ai or rule based system.  Without sanitization, 
Rdkit is about 60,000/second for smiles parsing on my laptop.  This is much 
faster when not making molecules, but I don't have the number handy.   

I expect it to be even faster with failing non smiles.  This should be 
sufficient for document scanning I think.


Brian Kelley

> On Dec 2, 2016, at 1:28 PM, George Papadatos <gpapada...@gmail.com> wrote:
> 
> I think Alexis was referring to converting actual SMILES strings found in 
> random text. Chemical entity recognition and name to structure conversion is 
> another story altogether and nowadays one can quickly go a long way with open 
> tools such as OSCAR + OPSIN in KNIME or with something like this: 
> http://chemdataextractor.org/docs/intro
> 
> George
> 
>> On 2 December 2016 at 17:35, Brian Kelley <fustiga...@gmail.com> wrote:
>> This was why they started using the dictionary lookup as I recall :). The 
>> iupac system they ended up using was Roger's when at OpenEye.
>> 
>> 
>> Brian Kelley
>> 
>>> On Dec 2, 2016, at 12:33 PM, Igor Filippov <igor.v.filip...@gmail.com> 
>>> wrote:
>>> 
>>> I could be wrong but I believe IBM system had a preprocessing step which 
>>> removed all known dictionary words - which would get rid of "submarine" etc.
>>> I also believe this problem has been solved multiple times in the past, 
>>> NextMove software comes to mind, chemical tagger - 
>>> http://chemicaltagger.ch.cam.ac.uk/, etc.
>>> 
>>> my 2 cents,
>>> Igor
>>> 
>>> 
>>> 
>>> 
>>>> On Fri, Dec 2, 2016 at 11:46 AM, Brian Kelley <fustiga...@gmail.com> wrote:
>>>> I hacked a version of RDKit's smiles parser to compute heavy atom count, 
>>>> perhaps some version of this could be used to check smiles validity 
>>>> without making the actual molecule.
>>>> 
>>>> From a fun historical perspective:  IBM had an expert system to find IUPAC 
>>>> names in documents.  They ended up finding things like "submarine" which 
>>>> was amusing.  It turned out that just parsing all words with the IUPAC 
>>>> parser was by far the fastest and best solution.  I expect the same will 
>>>> be true for finding smiles.
>>>> 
>>>> It would be interesting to put the common OCR errors into the parser as 
>>>> well (l's and 1's are hard for instance).
>>>> 
>>>> 
>>>>> On Fri, Dec 2, 2016 at 10:46 AM, Peter Gedeck <peter.ged...@gmail.com> 
>>>>> wrote:
>>>>> Hello Alexis,
>>>>> 
>>>>> Depending on the size of your document, you could consider limit storing 
>>>>> the already tested strings by word length and only memoize shorter words. 
>>>>> SMILES tend to be longer, so everything above a given number of 
>>>>> characters has a higher probability of being a SMILES. Large words 
>>>>> probably also contain a lot of chemical names. They often contain commas 
>>>>> (,), so they are easy to remove quickly. 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Peter
>>>>> 
>>>>> 
>>>>>> On Fri, Dec 2, 2016 at 5:43 AM Alexis Parenty 
>>>>>> <alexis.parenty.h...@gmail.com> wrote:
>>>>>> Dear Pavel And Greg,
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> Thanks Greg for the regexps link. I’ll use that too.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Pavel, I need to track on which document the SMILES are coming from, but 
>>>>>> I will indeed make a set of unique word for each document before 
>>>>>> looping. Thanks!
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Alexis
>>>>>> 
>>>>>> 
>>>>>> On 2 December 2016 at 11:21, Pavel <pavel_polishc...@ukr.net> wrote:
>>>>>> Hi, Alexis,
>>>>>> 
>>>>>>   if you should not track from which document SMILES come, you may just 
>>>>>> combine all words from all document in a list, take only unique words 
>>>>>> and try to test them. Thus, you should not store and check for 
>>>>>> valid/non-valid strings. That woul

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Brian Kelley
This was why they started using the dictionary lookup as I recall :). The iupac 
system they ended up using was Roger's when at OpenEye.


Brian Kelley

> On Dec 2, 2016, at 12:33 PM, Igor Filippov <igor.v.filip...@gmail.com> wrote:
> 
> I could be wrong but I believe IBM system had a preprocessing step which 
> removed all known dictionary words - which would get rid of "submarine" etc.
> I also believe this problem has been solved multiple times in the past, 
> NextMove software comes to mind, chemical tagger - 
> http://chemicaltagger.ch.cam.ac.uk/, etc.
> 
> my 2 cents,
> Igor
> 
> 
> 
> 
>> On Fri, Dec 2, 2016 at 11:46 AM, Brian Kelley <fustiga...@gmail.com> wrote:
>> I hacked a version of RDKit's smiles parser to compute heavy atom count, 
>> perhaps some version of this could be used to check smiles validity without 
>> making the actual molecule.
>> 
>> From a fun historical perspective:  IBM had an expert system to find IUPAC 
>> names in documents.  They ended up finding things like "submarine" which was 
>> amusing.  It turned out that just parsing all words with the IUPAC parser 
>> was by far the fastest and best solution.  I expect the same will be true 
>> for finding smiles.
>> 
>> It would be interesting to put the common OCR errors into the parser as well 
>> (l's and 1's are hard for instance).
>> 
>> 
>>> On Fri, Dec 2, 2016 at 10:46 AM, Peter Gedeck <peter.ged...@gmail.com> 
>>> wrote:
>>> Hello Alexis,
>>> 
>>> Depending on the size of your document, you could consider limit storing 
>>> the already tested strings by word length and only memoize shorter words. 
>>> SMILES tend to be longer, so everything above a given number of characters 
>>> has a higher probability of being a SMILES. Large words probably also 
>>> contain a lot of chemical names. They often contain commas (,), so they are 
>>> easy to remove quickly. 
>>> 
>>> Best,
>>> 
>>> Peter
>>> 
>>> 
>>>> On Fri, Dec 2, 2016 at 5:43 AM Alexis Parenty 
>>>> <alexis.parenty.h...@gmail.com> wrote:
>>>> Dear Pavel And Greg,
>>>> 
>>>>  
>>>> 
>>>> Thanks Greg for the regexps link. I’ll use that too.
>>>> 
>>>> 
>>>> 
>>>> Pavel, I need to track on which document the SMILES are coming from, but I 
>>>> will indeed make a set of unique word for each document before looping. 
>>>> Thanks!
>>>> 
>>>> Best,
>>>> 
>>>> Alexis
>>>> 
>>>> 
>>>> On 2 December 2016 at 11:21, Pavel <pavel_polishc...@ukr.net> wrote:
>>>> Hi, Alexis,
>>>> 
>>>>   if you should not track from which document SMILES come, you may just 
>>>> combine all words from all document in a list, take only unique words and 
>>>> try to test them. Thus, you should not store and check for valid/non-valid 
>>>> strings. That would reduce problem complexity as well.
>>>> 
>>>> Pavel.
>>>>> On 12/02/2016 11:11 AM, Greg Landrum wrote:
>>>>> An initial start on some regexps that match SMILES is here: 
>>>>> https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b38da34e22fb
>>>>> 
>>>>> that may also be useful
>>>>> 
>>>>> On Fri, Dec 2, 2016 at 11:07 AM, Alexis Parenty 
>>>>> <alexis.parenty.h...@gmail.com> wrote:
>>>>> Hi Markus,
>>>>> 
>>>>> 
>>>>> Yes! I might discover novel compounds that way!! Would be 
>>>>> interesting to see how they look like…
>>>>> 
>>>>> 
>>>>> Good suggestion to also store the words that were correctly identified as 
>>>>> SMILES. I’ll add that to the script.
>>>>> 
>>>>> 
>>>>> I also like your “distribution of word” idea. I could safely skip any 
>>>>> words that occur more than 1% of the time and could try to play around 
>>>>> with the threshold to find an optimum.
>>>>> 
>>>>> 
>>>>> I will try every suggestions and will time it to see what is best. I’ll 
>>>>> keep everyone in the loop and will share the script and results.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> 
>>>>> Alexis

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Brian Kelley
I hacked a version of RDKit's smiles parser to compute heavy atom count,
perhaps some version of this could be used to check smiles validity without
making the actual molecule.

>From a fun historical perspective:  IBM had an expert system to find IUPAC
names in documents.  They ended up finding things like "submarine" which
was amusing.  It turned out that just parsing all words with the IUPAC
parser was by far the fastest and best solution.  I expect the same will be
true for finding smiles.

It would be interesting to put the common OCR errors into the parser as
well (l's and 1's are hard for instance).


On Fri, Dec 2, 2016 at 10:46 AM, Peter Gedeck 
wrote:

> Hello Alexis,
>
> Depending on the size of your document, you could consider limit storing
> the already tested strings by word length and only memoize shorter words.
> SMILES tend to be longer, so everything above a given number of characters
> has a higher probability of being a SMILES. Large words probably also
> contain a lot of chemical names. They often contain commas (,), so they are
> easy to remove quickly.
>
> Best,
>
> Peter
>
>
> On Fri, Dec 2, 2016 at 5:43 AM Alexis Parenty <
> alexis.parenty.h...@gmail.com> wrote:
>
>> Dear Pavel And Greg,
>>
>>
>>
>> Thanks Greg for the regexps link. I’ll use that too.
>>
>>
>> Pavel, I need to track on which document the SMILES are coming from, but
>> I will indeed make a set of unique word for each document before looping.
>> Thanks!
>>
>> Best,
>>
>> Alexis
>>
>> On 2 December 2016 at 11:21, Pavel  wrote:
>>
>> Hi, Alexis,
>>
>>   if you should not track from which document SMILES come, you may just
>> combine all words from all document in a list, take only unique words and
>> try to test them. Thus, you should not store and check for valid/non-valid
>> strings. That would reduce problem complexity as well.
>>
>> Pavel.
>> On 12/02/2016 11:11 AM, Greg Landrum wrote:
>>
>> An initial start on some regexps that match SMILES is here:
>> https://gist.github.com/lsauer/1312860/264ae813c2bd2c27a769d261c8c6b3
>> 8da34e22fb
>>
>> that may also be useful
>>
>> On Fri, Dec 2, 2016 at 11:07 AM, Alexis Parenty <
>> alexis.parenty.h...@gmail.com> wrote:
>>
>> Hi Markus,
>>
>>
>> Yes! I might discover novel compounds that way!! Would be interesting to
>> see how they look like…
>>
>>
>> Good suggestion to also store the words that were correctly identified as
>> SMILES. I’ll add that to the script.
>>
>>
>> I also like your “distribution of word” idea. I could safely skip any
>> words that occur more than 1% of the time and could try to play around with
>> the threshold to find an optimum.
>>
>>
>> I will try every suggestions and will time it to see what is best. I’ll
>> keep everyone in the loop and will share the script and results.
>>
>>
>> Thanks,
>>
>>
>> Alexis
>>
>> On 2 December 2016 at 10:47, Markus Sitzmann 
>> wrote:
>>
>> Hi Alexis,
>>
>> you may find also so some "novel" compounds by this approach :-).
>>
>> Whether your tuple solution improves performance strongly depends on the
>> content of your text documents and how often they repeat the same words
>> again - but my guess would be it will help. Probably the best way is even
>> to look at the distribution of words before you feed them to RDKit. You
>> should also "memorize" those ones that successfully generated a structure,
>> doesn't make sense to do it again, then.
>>
>> Markus
>>
>> On Fri, Dec 2, 2016 at 10:21 AM, Maciek Wójcikowski <
>> mac...@wojcikowski.pl> wrote:
>>
>> Hi Alexis,
>>
>> You may want to filter with some regex strings containing not valid
>> characters (i.e. there is small subset of atoms that may be without
>> brackets). See "Atoms" section: http://www.daylight.com/
>> dayhtml/doc/theory/theory.smiles.html
>>
>> The set might grow pretty quick and may be inefficient, so I'd parse all
>> strings passing above filter. Although there will be some false positives
>> like "CC" which may occur in text (emails especially).
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>> 2016-12-02 10:11 GMT+01:00 Alexis Parenty 
>> :
>>
>> Dear all,
>>
>>
>> I am looking for a way to extract SMILES scattered in many text documents
>> (thousands documents of several pages each).
>>
>> At the moment, I am thinking to scan each words from the text and try to
>> make a mol object from them using Chem.MolFromSmiles() then store the words
>> if they return a mol object that is not None.
>>
>> Can anyone think of a better/quicker way?
>>
>>
>> Would it be worth storing in a tuple any word that do not return a mol
>> object from Chem.MolFromSmiles() and exclude them from subsequent search?
>>
>>
>> Something along those lines
>>
>>
>> excluded_set = set()
>>
>> smiles_list = []
>>
>> For each_word in text:
>>
>> If each_word not in excluded_set:
>>
>> each_word_mol =  

Re: [Rdkit-discuss] errors with windows10 RDKit installation using conda

2016-11-29 Thread Brian Kelley
Another point:

I had the same path length issues and eventually installed anacond3 into 

C:\a3

For building packages and they magically went away.  Super annoying.


Brian Kelley

> On Nov 29, 2016, at 8:11 AM, Mike Mazanetz <mi...@novadatasolutions.co.uk> 
> wrote:
> 
> Hi,
>  
> Thought I’d update my progress:
>  
> Steps to fix issue:
> change Admin permissions for Anaconda3
> ran these commands:
> conda update conda
> conda update -n root conda-build
> conda install -c rdkit rdkit=2016.09.2 
>  
> as RDKit it was change in the last 24 hours J
>  
> then found that this filename was too long, so just deleted it !
> C:\boost_1_59_0.zip\boost_1_59_0\libs\geometry\doc\html\geometry\reference\spatial_indexes\boost__geometry__index__rtree\rtree_parameters_type_const___.html
>  
> That fixed that problem.
>  
> then had to rename this file as it was named incorrectly in two places. – 
> probably didn’t need this file anyway !
> Appendix A An Introduction to Preprocessor Metaprogramming.html
>  
> got a little further down the road then encountered this:
> File not found - boost_*-mt-1_*.lib
> 0 File(s) copied
> File not found - boost_*-mt-1_*.dll
> 0 File(s) copied
> Command failed: cmd.exe /c bld.bat
>  
> So battling these issues now…
>  
> ./mike
>  
> From: Mike Mazanetz [mailto:mi...@novadatasolutions.co.uk] 
> Sent: 29 November 2016 12:33
> To: rdkit-discuss@lists.sourceforge.net
> Subject: Re: [Rdkit-discuss] errors with windows10 RDKit installation using 
> conda
>  
> Hello,
>  
> Further to my little exploratoin, I managed to do a few things:
> conda update conda
> conda update -n root conda-build
>  
> so I’ve got a little further, but I’ve run into a Windows nightmare – path 
> files are too long…  I can see the file in the cache:  boost_1_59_0.zip, but 
> the name is so long windows will not open it. 
>  
> Has anyone done a build in Windows10, can you please send me your build 
> routine?
>  
> Thanks,
> mike
>  
>  
> conda build boost
> BUILD START: boost-1.59.0-py35_3
> updating index in: C:\Program Files\Anaconda3\conda-bld\win-64
> updating index in: C:\Program Files\Anaconda3\conda-bld\noarch
>  
> The following NEW packages will be INSTALLED:
>  
> bzip2:  1.0.6-vc14_3  [vc14]
> pip:9.0.1-py35_0
> python: 3.5.2-0
> setuptools: 27.2.0-py35_1
> vs2015_runtime: 14.0.25123-0
> wheel:  0.29.0-py35_0
> zlib:   1.2.8-vc14_3  [vc14]
>  
> Source cache directory is: C:\Program Files\Anaconda3\conda-bld\src_cache
> Found source in cache: boost_1_59_0.zip
> Extracting download
> Traceback (most recent call last):
>   File "C:\Program Files\Anaconda3\Scripts\conda-build-script.py", line 5, in 
> 
> sys.exit(conda_build.cli.main_build.main())
>   File "C:\Program 
> Files\Anaconda3\lib\site-packages\conda_build\cli\main_build.py", line 244, 
> in main
> execute(sys.argv[1:])
>   File "C:\Program 
> Files\Anaconda3\lib\site-packages\conda_build\cli\main_build.py", line 236, 
> in execute
> already_built=None, config=config, noverify=args.no_verify)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\api.py", 
> line 75, in build
> need_source_download=need_source_download, config=config)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\build.py", 
> line 1227, in build_tree
> config=recipe_config)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\build.py", 
> line 790, in build
> config=config)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\render.py", 
> line 86, in parse_or_try_download
> source.provide(metadata.path, metadata.get_section('source'), 
> config=config)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\source.py", 
> line 485, in provide
> unpack(meta, config=config)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\source.py", 
> line 86, in unpack
> unzip(src_path, config.work_dir)
>   File "C:\Program Files\Anaconda3\lib\site-packages\conda_build\utils.py", 
> line 287, in unzip
> with open(path, 'wb') as fo:
> FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Program 
> Files\\Anaconda3\\conda-bld\\boost_1480420542489\\work\\boost_1_59_0\\libs\\geometry\\doc\\html\\geometry\\reference\\spatial_indexes\\boost__geometry__index__rtree\\rtree_parameters_type_constindexable_getter_constvalue_equal_constallocator_type_const___.html'
>  
>  
> From: Mike Mazanetz [mailto:mi...@novadataso

Re: [Rdkit-discuss] errors with windows10 RDKit installation using conda

2016-11-29 Thread Brian Kelley
I'm very close to having stable c# wrappers again.  The painful bit is writing 
all the tests, but if you would like to help out, I can send you the current 
builds.


Brian Kelley

> On Nov 29, 2016, at 3:06 PM, Michal Krompiec <michal.kromp...@gmail.com> 
> wrote:
> 
> 
> 
>> On Tuesday, 29 November 2016, Greg Landrum <greg.land...@gmail.com> wrote:
>> 
>>> On Tue, Nov 29, 2016 at 5:17 PM, Bob Funchess <bfunch...@kelaroo.com> wrote:
>>>  
>>> PS: All I really need is the C# wrappers; if those could be included in the 
>>> binary distribution it would be extremely helpful for me.
>>> 
>> 
>> Building these hasn't been automated since there aren't really any tests 
>> available and distributing something that hasn't been tested at all makes me 
>> very nervous. Of course if someone in the community who knows some C# were 
>> to contribute a set of tests (or even just a port of the existing Java 
>> wrapper tests) that would make me feel a lot safer. hint hint. :-)
>>  
> 
> If stable C# wrappers were available one could make a COM interop library and 
> access RDKit from VBA in MS Excel. I guess most of you aren't fans of Excel, 
> but it is still the workhorse for the less-geeky chemists and no such add-in 
> is widely available. 
> 
> Best,
> 
> Michal
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Brian Kelley
Peter,
  If you have chemfp and can make a chemfp arena, RDKit now supports these
structures for reading and searching.  This, by far, is the fastest way I
know of similarity searching.  I believe that Greg's implementation is
compatible with chemfp 1.0 which is available on pypi:

https://pypi.python.org/pypi/chemfp/1.0

In my copious spare time, I've been trying to think of ways to embed this
directly in a pandas dataframe however, using them side by side is
certainly doable.

Cheers,
 Brian


On Wed, Nov 23, 2016 at 10:06 AM, Peter Gedeck 
wrote:

> Is it possible to use the bulk similarity searching functionality for
> better performance instead of the list comprehension?
>
> Best,
>
> Peter
>
>
> On Wed, Nov 23, 2016 at 9:11 AM Greg Landrum 
> wrote:
>
> No worries.
> This, and Anna's question about similarity searching and clustering
> illustrate a great opportunity for a tutorial on fingerprints and
> similarity searching.
>
> -greg
>
>
>
>
>
> On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain" 
> wrote:
>
> Thanks for this,
>
> As a chemist who comes from the “cut and paste” school of scripting I’m
> always concerned I’m asking something blindingly obvious
>
> ;-)
>
> Chris
>
> On 23 Nov 2016, at 12:36, Greg Landrum  wrote:
>
> [including rdkit-discuss, because it's relevant there and I'm pretty sure
> Chris won't mind and the real Pandas experts may have a better answer than
> me.]
>
> On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain  wrote:
>
>
> I quite like storing molecules and associated data in a data frame and
> I’ve see that it is possible to use rdkit for substructure searching, it is
> possible to also do similarity searching?
>
>
> It's not built in since there are many possible fingerprints that could be
> used.
>
> It's not quite as convenient as the substructure search, but here's a
> little demo of what you can do to filter based on similarity:
>
> # Start by adding a fingerprint column:
> In [18]: df['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2)
> for x in df['ROMol']]
>
> # and now filter:
> In [21]: ndf =df[df.apply(lambda x: DataStructs.
> TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]
>
> In [23]: len(df)
> Out[23]: 1000
> In [24]: len(ndf)
> Out[24]: 2
>
> -greg
>
>
> 
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Find out if a molecuole object was generated from smiles or smarts

2016-11-07 Thread Brian Kelley
I would try checking:

atom.HasQuery()

I expect the smarts molecules have this property by default and smiles don't.  
Greg can confirm, and I can double check later today.


Brian Kelley

> On Nov 7, 2016, at 7:57 AM, Paul Emsley <pems...@mrc-lmb.cam.ac.uk> wrote:
> 
>> On 07/11/2016 12:37, Axel Pahl wrote:
>> 
>> amongst other options, I can generate an RDKit mol object by one of
>> these two ways:
>> 
>> mol1 = Chem.MolFromSmiles()
>> mol2 = Chem.MolFromSmarts()
>> 
>> Is there a possibility to detect for a given mol object whether it was
>> generated from Smiles or Smarts?
> 
> Not obviously to me.
> 
> Perhaps you can do something like this at creation time:
> 
> mol2.SetProp('origin', 'SMARTS')
> 
> then use mol.GetProp('origin') when you need to do the test (inside a 
> try/except KeyError).
> 
> Paul.
> 
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

2016-11-01 Thread Brian Kelley
I'll make two more points ( thanks to Greg Landrum for pointing this out )

1). In your code each call to suppl[i] makes a new molecule, calling it twice 
in a row is twice as slow.  This explains your last result.

2) in my example, I was assuming that the queries were already in a python list 
and not from a supplier.  If they are being read from a supplier, you can 
easily keep them all in memory with:

queries = list(query_supplier)

Note that for large files, this can take up a lot of memory.

Thanks for the clarification Greg.

Brian Kelley

> On Nov 1, 2016, at 4:22 AM, 杨弘宾 <yanyangh...@163.com> wrote:
> 
> Hi,
> Supposing I'd like to matching 100 substructures with 1000 compounds 
> represented as smiles.
> What I did is:
> 
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll):  # I have to make substructures in the first loop.
> for i in range(l): 
> suppl[i].GetSubstructMatches(s[j]) 
> and found the performance is not good.
> 
> Then I did a comparison and found that it was because the conformation of the 
> compounds where not initiated.
> If I use MolFromSmiles,the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t') 
> l=len(suppl) 
> print time.clock()-start   # >>> 0.0373735355168  indicating that the 
> molecules were not initiated.
> for i in range(l): 
> suppl[i].GetSubstructMatches(sa) 
> suppl[i].GetSubstructMatches(sa2) 
> print time.clock()-start   # >>> 11.1884715172
> start = time.clock() 
> f = open('allmoleculenew.smi') 
> for i in range(l): 
> mol = Chem.MolFromSmiles(f.next().split('\t')[0]) 
> mol.GetSubstructMatches(sa) 
> mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
> 
> The second method was double faster than the first, indicating that the 
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it 
> didnot parse the smiles immediately, which adds the time complexity to the 
> further application. So is it possible to manually initiate the compounds?
> 
> Hongbin Yang 杨弘宾 
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy 
> East China University of Science and Technology 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a way to init the conformations of smiles supplier to improve the performance for substructure matching.

2016-11-01 Thread Brian Kelley
A supplier is random access, so your call to supp[I] here is probably quite
expensive:

suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
l = len(suppl)
for j in range(ll):  # I have to make substructures in the first loop.
for i in range(l):
suppl[i].GetSubstructMatches(s[j])

I highly suggest using the python iteration as opposed to using an index
such as:

for mol in suppl:
  for pat in s:
  mol.GetSubstructMatches(pat)

I expect this will help quite a bit.  You may also consider using the
FilterCatalog which is designed to handle larger data sets and may help in
your case.

On Tue, Nov 1, 2016 at 4:22 AM, 杨弘宾  wrote:

> Hi,
> Supposing I'd like to matching 100 substructures with 1000 compounds
> represented as smiles.
> What I did is:
>
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l = len(suppl)
> for j in range(ll):  # I have to make substructures in the first loop.
> for i in range(l):
> suppl[i].GetSubstructMatches(s[j])
> and found the performance is not good.
>
> Then I did a comparison and found that it was because the conformation of
> the compounds where not initiated.
> If I use MolFromSmiles,the performance will improve a lot.
> start = time.clock()
> suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')
> l=len(suppl)
> print time.clock()-start   # >>> 0.0373735355168  indicating that the
> molecules were not initiated.
> for i in range(l):
> suppl[i].GetSubstructMatches(sa)
> suppl[i].GetSubstructMatches(sa2)
> print time.clock()-start   # >>> 11.1884715172
> start = time.clock()
> f = open('allmoleculenew.smi')
> for i in range(l):
> mol = Chem.MolFromSmiles(f.next().split('\t')[0])
> mol.GetSubstructMatches(sa)
> mol.GetSubstructMatches(sa2)
> print time.clock()-start # >>> 5.44030582111
>
> The second method was double faster than the first, indicating that the
> "init" is more time consuming compared to matching.
> I think SmilesMolSupplier is a good API to load multiple compounds but it
> didnot parse the smiles immediately, which adds the time complexity to
> the further application. So is it possible to manually initiate the
> compounds?
>
> --
> Hongbin Yang 杨弘宾
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy
> East China University of Science and Technology
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-30 Thread Brian Kelley
Rdkit already has a way to serialize conformers, the binary pickle format!

Perhaps we should make a file extension for multiple molecules.  Say ".rdk" and 
call it a day.   Like inchi the source code is the reference  :) 

----
Brian Kelley

> On Oct 27, 2016, at 2:05 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys format.
> It's easy to imagine something better, but this is at least already there and 
> there could be other software that speaks it:
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_data/cmpd2.tpl
> 
> I'd still like to do a decent JSON format and adding multi-confs to that 
> would be logical
> 
>> On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove <davidacosgrov...@gmail.com> 
>> wrote:
>> I've been wondering if, now that you can get decent conformations from 
>> RDKit, it would be worth devising a multi-conformation file format to make 
>> reading multi-conf molecules faster for vs purposes. In my experience, 
>> pulling all the conformers out of an ascii file such as an sdf can become 
>> the RDS for pharmacophore searchimg. Something to think about at the 
>> hackathon maybe and certainly something that deserves a new email thread. 
>> 
>> Dave
>> 
>> 
>>> On Thursday, 27 October 2016, Greg Landrum <greg.land...@gmail.com> wrote:
>>> Hi Thomas,
>>> 
>>> You're right, reading multiple conformations out of an SDF does seem like 
>>> one of those common operations. Unfortunately the RDKit does not currently 
>>> support it in an easy way.
>>> 
>>> A python implementation of this would be a good topic for Friday's UGM 
>>> hackathon, we can see if anyone finds it interesting enough to work on.
>>> 
>>> -greg
>>> 
>>> 
>>>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis <teva...@gmail.com> 
>>>> wrote:
>>>> Hello everyone,
>>>> 
>>>> I am a new user of RDkit and I was looking in the documentation for an 
>>>> easy way to load multiple conformers from a structure file like .sdf. The 
>>>> code must 1) distinguish between different protonation states of the same 
>>>> molecule,  2) create a new Mol() object for each protonation state and 
>>>> load into it the respective conformers. 
>>>> 
>>>> Apparently I can work out a solution for 1) using mol.GetProp('_Name'), 
>>>> mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was wondering 
>>>> if there is any more straight forward way to do it. 
>>>> For 2) I guess I must iterate over all molecules in the input file, create 
>>>> new Mol() objects (one for each protonation state of each ligand) and add 
>>>> conformers to these new Mol() objects. Again this sounds easily 
>>>> programmable, but sounds like a very common operation, thus I was 
>>>> wondering if it has been implemented in a function.
>>>> 
>>>> thanks in advance
>>>> Thomas
>>>> 
>>>> 
>>>> -- 
>>>> ==
>>>> Thomas Evangelidis
>>>> Research Specialist
>>>> CEITEC - Central European Institute of Technology
>>>> Masaryk University
>>>> Kamenice 5/A35/1S081, 
>>>> 62500 Brno, Czech Republic 
>>>> 
>>>> email: tev...@pharm.uoa.gr
>>>>teva...@gmail.com
>>>> 
>>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>> 
>>>> 
>>>> --
>>>> The Command Line: Reinvented for Modern Developers
>>>> Did the resurgence of CLI tooling catch you by surprise?
>>>> Reconnect with the command line and become more productive.
>>>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>>>> http://sdm.link/telerik
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>> 
>>> 
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive. 
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] property of name in smilesMolSupplier

2016-10-13 Thread Brian Kelley
Paolo, _Name might be considered a private property, you might need to use

GetPropNames(True,True) or something like that.


Brian Kelley

> On Oct 13, 2016, at 6:56 AM, Paolo Tosco <paolo.to...@unito.it> wrote:
> 
> Hi Hongbin,
> 
> suppl[0].GetPropNames() is an interable object, so you can use it in for 
> loops such as:
> 
> for i in suppl[0].GetPropNames():
>   print (i)
> 
> or you may convert it to a list:
> 
> l = list(suppl[0].GetPropNames())
> print (l)
> 
> Cheers,
> p.
> 
>> On 10/13/16 11:31, 杨弘宾 wrote:
>> Hi,
>> I spent a lot of time to explorer "How to get the property of name when 
>> using SmilesMolSupplier"
>> 
>> I had a smiles file like this:
>> ===
>> c1c1\t1\n
>> ...
>> ===
>> where 1 means that it is positve
>> 
>> So I wanted to read this smiles file via SMilesMolSupplier and I knew that 
>> the second column is the default name column.
>> suppl = 
>> Chem.SmilesMolSupplier('compounds.smi',delimiter='\t',titleLine=False)
>> 
>> However, I could not get the property of 1 because I had no idea what the 
>> property_name was.
>> In the document, it shows that :
>> If the input file has a title line and more than two columns (smiles and 
>> id), the
>> additional columns will be used to set properties on each molecule.  The 
>> properties
>> are accessible using the mol.GetProp(propName) method.
>> But It was a "no title table", So I thought its property_name of "their 
>> names" should be "Name" or "name" as default. And I failed...
>> After read the source code, I found that it should be _Name
>> https://github.com/rdkit/rdkit/blob/f4529c910e546af590c56eba01f96e9015c269a6/Code/GraphMol/FileParsers/SmilesMolSupplier.cpp#L194
>>  
>> I think the document should be improved so that others may know how to get 
>> the name of each compound
>> BTW, I tried to use suppl[0].GetPropNames() but only to get 
>> "> std::char_traits,class std::allocator > at 0x7402e90>" that 
>> seemed tell nothing. I wondered that is there any way to make it readable in 
>> python?
>> 
>> Hongbin Yang 杨弘宾 
>> Research: Toxicophore and Chemoinformatics
>> Pharmaceutical Science, School of Pharmacy 
>> East China University of Science and Technology 
>> 
>> 
>> --
>> Check out the vibrant tech community on one of the world's most 
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> 
>> 
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] property of name in smilesMolSupplier

2016-10-13 Thread Brian Kelley
I agree.  I tend to think molecules should have GetName/SetName since this is a 
common operation.

One note though:

mol.GetPropsAsDict()

Might help with introspection.  There are optionally arguments that allow 
looking at private and computed properties, I.e.

mol.GetPropsAsDict(True,True)

I believe will list them all.


Brian Kelley

> On Oct 13, 2016, at 6:31 AM, 杨弘宾 <yanyangh...@163.com> wrote:
> 
> Hi,
> I spent a lot of time to explorer "How to get the property of name when 
> using SmilesMolSupplier"
> 
> I had a smiles file like this:
> ===
> c1c1\t1\n
> ...
> ===
> where 1 means that it is positve
> 
> So I wanted to read this smiles file via SMilesMolSupplier and I knew that 
> the second column is the default name column.
> suppl = Chem.SmilesMolSupplier('compounds.smi',delimiter='\t',titleLine=False)
> 
> However, I could not get the property of 1 because I had no idea what the 
> property_name was.
> In the document, it shows that :
> If the input file has a title line and more than two columns (smiles and id), 
> the
> additional columns will be used to set properties on each molecule.  The 
> properties
> are accessible using the mol.GetProp(propName) method.
> But It was a "no title table", So I thought its property_name of "their 
> names" should be "Name" or "name" as default. And I failed...
> After read the source code, I found that it should be _Name
> https://github.com/rdkit/rdkit/blob/f4529c910e546af590c56eba01f96e9015c269a6/Code/GraphMol/FileParsers/SmilesMolSupplier.cpp#L194
>  
> I think the document should be improved so that others may know how to get 
> the name of each compound
> BTW, I tried to use suppl[0].GetPropNames() but only to get 
> " std::char_traits,class std::allocator > at 0x7402e90>" that 
> seemed tell nothing. I wondered that is there any way to make it readable in 
> python?
> 
> 
> Hongbin Yang 杨弘宾 
> Research: Toxicophore and Chemoinformatics
> Pharmaceutical Science, School of Pharmacy 
> East China University of Science and Technology 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Implementation details bitvectors from morgan/circular fingerprints

2016-10-06 Thread Brian Kelley
It should be noted that hashing to bits always looses information, there is
simply no way around this.

While this is a partial answer to your question, you can at least find the
atoms that set a bit.  Look for "Explaining bits from Morgan Fingerprints"

http://www.rdkit.org/docs/GettingStartedInPython.html

Also, while morgan fingerprints are not bloom filters, the wikipedia entry
on bloom filters has a lot of information regarding alternative hashing
functions and does have some history of chemical fingerprints in general
that may help answer some questions:

https://en.wikipedia.org/wiki/Bloom_filter


On Thu, Oct 6, 2016 at 6:02 AM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Dear Jacob,
>
> This is a Hashing funciton that is used to compress the data:
>
> https://en.wikipedia.org/wiki/Universal_hashing
>
> http://rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
>
> Greg write it in page 2 => "Typical kernels extract features of the
> molecule, hash them, and use
> the hash to determine bits that should be set"
>
> The hashing is a simple function like modulo, etc,...
>
> Best regards,
>
> Dr. Guillaume GODIN
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645
> MOBILE  +41 (0)79 536 1039
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
>
> 
> De : Jacob Gora 
> Envoyé : jeudi 6 octobre 2016 11:23
> À : rdkit-discuss@lists.sourceforge.net
> Objet : [Rdkit-discuss] Implementation details bitvectors from
> morgan/circular fingerprints
>
> Hi,
>
> is there any information on how RDkit creates bitvectors from circular
> fingerprints?
> As the theoretic featurespace is too big for storage and the default
> feature space used in RDkit, when converting is only 2048, there must be
> some kind of
> information loss (and compression?).
>
> Can anyone explain how this is handled in detail?
> What features are used for the BV in the end, how is it decided on.
>
> Regards
> Jacob
>
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> **
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> **
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Trouble compiling and installing on Ubuntu 14.04

2016-10-03 Thread Brian Kelley
Phillip,  if you run

ctest --debug

This may help us diagnose the issue with paths and such.  Note that you will 
get a LOT of output.


Brian Kelley

> On Oct 3, 2016, at 4:05 PM, Philip Adler <padl...@haverford.edu> wrote:
> 
> it does not seem that I am missing any dependencies.
> 
>> On Mon, Oct 3, 2016 at 3:55 PM, Paolo Tosco <paolo.to...@unito.it> wrote:
>> Hi Philip,
>> 
>> Just wondering if CMake is pulling in Python 2.7 because you are missing 
>> some Python dependencies on 3.4 (e.g., NumPy)?
>> 
>> p.
>> 
>>> On 3 Oct 2016, at 20:47, Philip Adler <padl...@haverford.edu> wrote:
>>> 
>>> Those variables were set correctly in my CMakeCache.txt.
>>> 
>>>> On Mon, Oct 3, 2016 at 2:47 PM, Peter Gedeck <peter.ged...@gmail.com> 
>>>> wrote:
>>>> You can also check the CMakeCache.txt file in the build directory. When I 
>>>> last compiled for 3.5 on the Mac, I had to correct the PYTHON_INCLUDE_DIR. 
>>>> 
>>>> Greg, PYTHON_INCLUDE_DIR was incorrectly set after "cmake ..". Executable 
>>>> and library correctly found. 
>>>> 
>>>> 
>>>> //Path to a program.
>>>> PYTHON_EXECUTABLE:FILEPATH=/Users/peter/miniconda3/bin/python
>>>> //Path to a file.
>>>> PYTHON_INCLUDE_DIR:PATH=/System/Library/Frameworks/Python.framework/Headers
>>>> //Path to a library.
>>>> PYTHON_LIBRARY:FILEPATH=/Users/peter/miniconda3/lib/libpython3.5m.dylib
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Peter
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mon, Oct 3, 2016 at 2:00 PM Philip Adler <padl...@haverford.edu> wrote:
>>>>> Greg (with apologies for the repeat for the benefit of the mailing list 
>>>>> -gmail is great up until it isn't!),
>>>>> 
>>>>> Please see below,
>>>>> 
>>>>> import _frozen_importlib # frozen
>>>>> import imp # builtin
>>>>> import sys # builtin
>>>>> # installing zipimport hook
>>>>> # installed zipimport hook
>>>>> # /usr/lib/python3.4/encodings/__pycache__/__init__.cpython-34.pyc 
>>>>> matches /usr/lib/python3.4/encodings/__init__.py
>>>>> # code object from 
>>>>> '/usr/lib/python3.4/encodings/__pycache__/__init__.cpython-34.pyc'
>>>>> # /usr/lib/python3.4/__pycache__/codecs.cpython-34.pyc matches 
>>>>> /usr/lib/python3.4/codecs.py
>>>>> # code object from '/usr/lib/python3.4/__pycache__/codecs.cpython-34.pyc'
>>>>> import 'codecs' # <_frozen_importlib.SourceFileLoader object at 
>>>>> 0x7f5aa7a14dd8>
>>>>> # /usr/lib/python3.4/encodings/__pycache__/aliases.cpython-34.pyc matches 
>>>>> /usr/lib/python3.4/encodings/aliases.py
>>>>> # code object from 
>>>>> '/usr/lib/python3.4/encodings/__pycache__/aliases.cpython-34.pyc'
>>>>> import 'encodings.aliases' # <_frozen_importlib.SourceFileLoader object 
>>>>> at 0x7f5aa7a2a908>
>>>>> import 'encodings' # <_frozen_importlib.SourceFileLoader object at 
>>>>> 0x7f5aa7a149b0>
>>>>> # /usr/lib/python3.4/encodings/__pycache__/utf_8.cpython-34.pyc matches 
>>>>> /usr/lib/python3.4/encodings/utf_8.py
>>>>> # code object from 
>>>>> '/usr/lib/python3.4/encodings/__pycache__/utf_8.cpython-34.pyc'
>>>>> import 'encodings.utf_8' # <_frozen_importlib.SourceFileLoader object at 
>>>>> 0x7f5aa79b75f8>
>>>>> # /usr/lib/python3.4/encodings/__pycache__/latin_1.cpython-34.pyc matches 
>>>>> /usr/lib/python3.4/encodings/latin_1.py
>>>>> # code object from 
>>>>> '/usr/lib/python3.4/encodings/__pycache__/latin_1.cpython-34.pyc'
>>>>> import 'encodings.latin_1' # <_frozen_importlib.SourceFileLoader object 
>>>>> at 0x7f5aa79b9160>
>>>>> # /usr/lib/python3.4/__pycache__/io.cpython-34.pyc matches 
>>>>> /usr/lib/python3.4/io.py
>>>>> # code object from '/usr/lib/python3.4/__pycache__/io.cpython-34.pyc'
>>>>> # /usr/lib/python3.4/__pycache__/abc.cpython-34.pyc matches 
>>>>> /usr/lib/python3.4/abc.py
>>>>> # code object from '/usr/lib/python3.4/__pycache__/abc.cpython-34.pyc'
>>>>> # /usr/lib/python3.4/__pycache__/_weakrefset.cpython-34.pyc matc

Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-03 Thread Brian Kelley
I was assuming the issue is the string overload for filename versus string data 
which causes the need for the SetData function.

What I meant by not re-constructing the class is:

nsuppl = Chem.SDMolSupplier()
nsuppl.SetData(mb)
mol = nsuppl.next()

nsuppl.SetData(mb2)
mol = nsuppl.next()

A minor optimization which isn't as easily possible with the StringIO 
implementation.


Brian Kelley

> On Oct 3, 2016, at 9:04 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> 
> 
>> On Mon, Oct 3, 2016 at 2:53 PM, Brian Kelley <fustiga...@gmail.com> wrote:
>> I'll admit that using StringIO here feels more pythonic, although SetData 
>> can be reused without a reconstructing the class.
> 
> I guess pythonic is in the eyes of the beholder, but I don't understand that 
> at all... 
> 
>> I suppose I would prefer having something like
>> 
>> MolToSDDataBlock
>> 
>> Which can be used in conjunction with MolToMolBlock.  I have often found 
>> that many times data changes without molecule change so perhaps both could 
>> be useful.
> 
> But then that's a different problem entirely. :-)
> 
> -greg
> 
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-02 Thread Brian Kelley
No, you have the correct method.  The general idea, I believe, is that if the 
format can result in multiple molecules a supplier should be used.


Brian Kelley

> On Oct 2, 2016, at 4:48 PM, Maciek Wójcikowski <mac...@wojcikowski.pl> wrote:
> 
> Yes I get it, but obviously there is no MolFromSDBlock, so one would suspect 
> MolFromMolBlock to support both formats. As I understand correctly the only 
> way of reading SD from variable is as presented in my example? Or is there 
> some marvelous undocumented API? ;)
> 
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
> 
> 2016-10-02 22:20 GMT+02:00 Brian Kelley <fustiga...@gmail.com>:
>> It's neither a bug nor a feature in this case, simply the specification of 
>> the mdl format.
>> 
>> The SD in an sd file stands for "structured data" which are the properties 
>> you are looking for plus the mol block.
>> 
>> A decent write up is here:
>> 
>> https://en.m.wikipedia.org/wiki/Chemical_table_file
>> 
>> If you see the dollar signs in your text block, it is indeed an sd record 
>> not just a mol block.
>> 
>> 
>> Brian Kelley
>> 
>>> On Oct 2, 2016, at 3:46 PM, Maciek Wójcikowski <mac...@wojcikowski.pl> 
>>> wrote:
>>> 
>>> Hi RDKitters,
>>> 
>>> Is it a bug or a feature? When using Chem.MolFromMolBlock there is no 
>>> properties from SD file. There is a bit of code to replicate that issue:
>>> 
>>>> from rdkit import Chem
>>>> tmp = """20346
>>>>  RDKit  3D
>>>>  36 38  0  0  0  0  0  0  0  0999 V2000
>>>>15.8390   -9.3370   68.8840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>17.1400   -9.1830   69.5480 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>17.4030   -7.7570   69.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>17.0930   -7.4160   71.2420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>18.2300   -6.5720   71.8210 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>18.6770   -7.1570   73.0920 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>20.1430   -7.2290   73.1530 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>20.5650   -8.5770   73.7380 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>21.6390   -9.1530   72.9180 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>21.3560  -10.5710   72.6640 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>21.5940  -10.8820   71.1850 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>20.4320  -11.7190   70.6460 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>20.0430  -11.2190   69.3210 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>18.5820  -11.1310   69.1980 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>18.1950   -9.7400   68.6920 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>14.7370   -9.2360   69.9070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>13.1700   -7.9140   71.1420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>14.1800   -8.0060   70.2040 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>14.2840  -10.3730   70.5500 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>13.2740  -10.2810   71.4890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>12.7160   -9.0510   71.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>11.6140   -8.9500   72.8070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>11.8810   -7.8160   73.7040 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>12.9350   -8.1480   74.6710 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>14.2790   -7.6200   74.1620 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>14.9870   -6.8610   75.2860 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>15.4440   -5.5580   74.7840 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>15.1190   -4.5130   75.7650 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>14.5870   -3.2770   75.0370 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>13.3710   -2.7990   75.7080 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>12.3120   -2.5090   74.7330 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>10.9900   -3.1010   75.2260 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>10.3040   -3.8450   74.0790 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>> 9.7590   -5.1150   74.5740 N   0  0  0  0  0  0  0  0  0  0  0  0
>>>>10.0350   -6.2090   73.6340 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>10.6540   -7.3880   74.3890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>   2  1  1  0
>>

Re: [Rdkit-discuss] MolFromMolBlock does not read properties

2016-10-02 Thread Brian Kelley
It's neither a bug nor a feature in this case, simply the specification of the 
mdl format.

The SD in an sd file stands for "structured data" which are the properties you 
are looking for plus the mol block.

A decent write up is here:

https://en.m.wikipedia.org/wiki/Chemical_table_file

If you see the dollar signs in your text block, it is indeed an sd record not 
just a mol block.

----
Brian Kelley

> On Oct 2, 2016, at 3:46 PM, Maciek Wójcikowski <mac...@wojcikowski.pl> wrote:
> 
> Hi RDKitters,
> 
> Is it a bug or a feature? When using Chem.MolFromMolBlock there is no 
> properties from SD file. There is a bit of code to replicate that issue:
> 
>> from rdkit import Chem
>> tmp = """20346
>>  RDKit  3D
>>  36 38  0  0  0  0  0  0  0  0999 V2000
>>15.8390   -9.3370   68.8840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>17.1400   -9.1830   69.5480 N   0  0  0  0  0  0  0  0  0  0  0  0
>>17.4030   -7.7570   69.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>17.0930   -7.4160   71.2420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.2300   -6.5720   71.8210 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.6770   -7.1570   73.0920 N   0  0  0  0  0  0  0  0  0  0  0  0
>>20.1430   -7.2290   73.1530 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.5650   -8.5770   73.7380 C   0  0  0  0  0  0  0  0  0  0  0  0
>>21.6390   -9.1530   72.9180 N   0  0  0  0  0  0  0  0  0  0  0  0
>>21.3560  -10.5710   72.6640 C   0  0  0  0  0  0  0  0  0  0  0  0
>>21.5940  -10.8820   71.1850 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.4320  -11.7190   70.6460 C   0  0  0  0  0  0  0  0  0  0  0  0
>>20.0430  -11.2190   69.3210 N   0  0  0  0  0  0  0  0  0  0  0  0
>>18.5820  -11.1310   69.1980 C   0  0  0  0  0  0  0  0  0  0  0  0
>>18.1950   -9.7400   68.6920 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.7370   -9.2360   69.9070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.1700   -7.9140   71.1420 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.1800   -8.0060   70.2040 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.2840  -10.3730   70.5500 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.2740  -10.2810   71.4890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>12.7160   -9.0510   71.7840 C   0  0  0  0  0  0  0  0  0  0  0  0
>>11.6140   -8.9500   72.8070 C   0  0  0  0  0  0  0  0  0  0  0  0
>>11.8810   -7.8160   73.7040 N   0  0  0  0  0  0  0  0  0  0  0  0
>>12.9350   -8.1480   74.6710 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.2790   -7.6200   74.1620 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.9870   -6.8610   75.2860 C   0  0  0  0  0  0  0  0  0  0  0  0
>>15.4440   -5.5580   74.7840 N   0  0  0  0  0  0  0  0  0  0  0  0
>>15.1190   -4.5130   75.7650 C   0  0  0  0  0  0  0  0  0  0  0  0
>>14.5870   -3.2770   75.0370 C   0  0  0  0  0  0  0  0  0  0  0  0
>>13.3710   -2.7990   75.7080 N   0  0  0  0  0  0  0  0  0  0  0  0
>>12.3120   -2.5090   74.7330 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.9900   -3.1010   75.2260 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.3040   -3.8450   74.0790 C   0  0  0  0  0  0  0  0  0  0  0  0
>> 9.7590   -5.1150   74.5740 N   0  0  0  0  0  0  0  0  0  0  0  0
>>10.0350   -6.2090   73.6340 C   0  0  0  0  0  0  0  0  0  0  0  0
>>10.6540   -7.3880   74.3890 C   0  0  0  0  0  0  0  0  0  0  0  0
>>   2  1  1  0
>>   3  2  1  0
>>   4  3  1  0
>>   5  4  1  0
>>   6  5  1  0
>>   7  6  1  0
>>   8  7  1  0
>>   9  8  1  0
>>  10  9  1  0
>>  11 10  1  0
>>  12 11  1  0
>>  13 12  1  0
>>  14 13  1  0
>>  15 14  1  0
>>  15  2  1  0
>>  16  1  1  0
>>  18 16  1  0
>>  18 17  2  0
>>  19 16  2  0
>>  20 19  1  0
>>  21 20  2  0
>>  21 17  1  0
>>  22 21  1  0
>>  23 22  1  0
>>  24 23  1  0
>>  25 24  1  0
>>  26 25  1  0
>>  27 26  1  0
>>  28 27  1  0
>>  29 28  1  0
>>  30 29  1  0
>>  31 30  1  0
>>  32 31  1  0
>>  33 32  1  0
>>  34 33  1  0
>>  35 34  1  0
>>  36 35  1  0
>>  36 23  1  0
>> M  END
>> >(1) 
>> 0.81
>> >(1) 
>> =
>> >(1) 
>> IC50
>> >(1) 
>> CHEMBL18442
>> 
>> """
>> m = Chem.MolFromMolBlock(tmp)
>> print m.GetPropsAsDict()
>> from StringIO import StringIO
>> m = Chem.ForwardSDMolSupplier(StringIO(tmp)).next()
>> print m.GetPropsAsDict()
> 
> 
> 
> Po

Re: [Rdkit-discuss] Getting started with C++

2016-09-24 Thread Brian Kelley
I think this is a fantastic idea.  I'll even contribute and include the 
examples in standard dist as cmake targets  to boot.

This will both help people start projects but also validate the docs.  Nothing 
worse than examples that don't compile :)


Brian Kelley

> On Sep 24, 2016, at 11:51 AM, David Cosgrove <davidacosgrov...@gmail.com> 
> wrote:
> 
> Hi All,
> 
> I'm contemplating starting a chapter in the documentation called 'Getting 
> Started with the RDKit in C++' which would mirror the information given in 
> the Python chapter but with examples in C++ for those of us diehards who like 
> to program in a compiled language.  As I recall, the learning curve was quite 
> steep at the beginning, so I thought it would be helpful to ease others into 
> the real world.
> 
> The purpose of this email was just to check that no one else is working on 
> this already - it would be a shame to duplicate effort.  If so, I will 
> happily pitch in, if not then I'll crack on and if anyone else wants to help, 
> please get in touch.
> 
> Cheers,
> Dave
> 
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-24 Thread Brian Kelley
I whole heartedly agree.

One thing that may help RHEL6 is that anaconda actually can install/build
gcc4.8 in user space: https://anaconda.org/anaconda/gcc/.  Note: it does
require root to install some dependencies, but doesn't override the system
gcc.

While this is not a complete solution for many applications, for python and
python related packages it really is a god-send.  We have some legacy
systems stuck on RHEL6 and have had to use this to compile some newer
packages.

As a side note, the latest Xcode update has deprecated libstdc++ which is
par-for-the-course dealing with Apple's style of updating.  This will
likely require recompilation and new packages on Anaconda which will
probably be painful.

Cheers,
 Brian

On Sat, Sep 24, 2016 at 2:25 AM, Greg Landrum 
wrote:

> Dear all,
>
> I just did a blog post describing a proposal for some upcoming changes to
> the RDKit code base:
> https://medium.com/@greg.landrum_t5/the-rdkit-and-
> modern-c-48206b966218?source=linkShare-d698b3fa9f7-1474698147
>
> This is a big and important change and I'd love to hear whatever feedback
> members of the community may have. Please comment either on the blog post
> or here.
>
> Best Regards,
> -greg
>
>
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Meetup in Cambridge October 19

2016-09-15 Thread Brian Kelley
I knew I had forgotten something!  The address in the link should be clear
though :)

Cheers,
 Brian

On Thu, Sep 15, 2016 at 3:19 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

> yeah, "New Cambridge" :-)
>
> On Thu, Sep 15, 2016 at 12:18 PM, Tim Dudgeon <tdudgeon...@gmail.com>
> wrote:
>
>> Which Cambridge? Assume this is Cambridge MA not Cambridge UK?
>>
>> On 15/09/2016 19:29, Brian Kelley wrote:
>>
>> Novartis is kindly hosting an RDKit Meetup October 19th starting at
>> 4:30pm
>>
>> The first 45 minutes or so will be devoted to introducing new features in
>> the RDKit and also a Q and A or tutorial session.  From 5:30 - 7 or 7:30
>> depending, we will be inviting speakers .  There are talking slots still
>> free so please drop us a line if you want to give a short talk (20-30
>> minutes is what we are looking for.)
>>
>> To attend or just join the community, please use the link below, we do
>> have a maximum number of spots so please RSVP early.
>>
>> http://www.meetup.com/Cambridge-RDKit-Meetup/events/234150101/?
>>
>>
>> Cheers,
>>  Brian Kelley
>>
>>
>> --
>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>> 
>> --
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] highlightColor in Draw.MolsToGridImage()

2016-07-18 Thread Brian Kelley
I'll need to test to seen it works, but the ids should be generated from the 
atom and bond indices in a deterministic fashion.


Brian Kelley

> On Jul 18, 2016, at 4:35 AM, Dmitri Maziuk <dmaz...@bmrb.wisc.edu> wrote:
> 
>> On 7/17/2016 8:17 AM, Brian Kelley wrote:
>> Svg can actually be styled with css to change properties. It might
>> be worthwhile to start adding proper ids to our svg elements for more
>> flexibility.
> 
> As long as my code for generating CSS can figure out the ids from Mol's 
> atoms and bonds...
> 
> Dima
> 
> 
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are 
> consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports.http://sdm.link/zohodev2dev
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] highlightColor in Draw.MolsToGridImage()

2016-07-17 Thread Brian Kelley
Svg can actually be styled with css to change properties.  It might be 
worthwhile to  start adding proper ids to our svg elements for more 
flexibility.  I might spend some effort investigating this a bit more as it 
makes making interactive images a bit easier.

That being said, having more options for rendering is always a good thing.


Brian Kelley

> On Jul 17, 2016, at 7:21 AM, Dmitri Maziuk <dmaz...@bmrb.wisc.edu> wrote:
> 
>> On 7/17/2016 5:29 AM, Greg Landrum wrote:
>> 
>> 
>> On Sat, Jul 16, 2016 at 9:37 PM, DmitriR <xzf...@gmail.com
>> <mailto:xzf...@gmail.com>> wrote:
> 
>>(2) For the SVG rendering, is there perhaps a way to change the with
>>(weight) of the stroke?
>>For smaller images, the default bond width looks somewhat heavy.
>> 
>> 
>> At the moment, there's no way at all to control the line width from
>> Python. I'll take a look at that.
> 
> I've been running the SVG through a little regexp-replace subroutine, 
> adjusting all sorts of things. Fine-tuning the image would be more 
> useful if other backends could use it too, but for SVG there is an 
> easier way out.
> 
> Dima
> 
> 
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are 
> consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports.http://sdm.link/zohodev2dev
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] m.GetProp('property'): KeyError: 'property'

2016-07-12 Thread Brian Kelley
Markus,
  In newer versions of RDKit there is a handy method:

d = mol.GetPropsAsDict()

which returns a python dictionary of all the sd data by default.  It also
has a nice feature that it converts numeric values to proper python
numbers.  This can be used as a python dictionary:

for prop,value in d.items():
print(prop,value)

d.get(prop, None)

and so on.

Cheers,
 Brian


On Tue, Jul 12, 2016 at 12:20 PM, Paolo Tosco  wrote:

> Dear Markus,
>
> you may check if the key exists before trying to retrieve its value with
> the HasProp() method.
>
> Cheers,
> p.
> On 12/07/2016 17:10, Markus Metz wrote:
>
> Dear all:
>
> I spend some time searching the rdkit website and its mailing list. But I
> was not able to find anything regarding my issue. I am a newbie. So bear
> with me if this is a FAQ.
>
> I am trying to extract the properties for molecules from an sd file.
> This worked fine with the id.
> But not so for other properties. And I assume this has to do with that not
> for all molecules in the data set these properties are defined. Could this
> be the reason for the error?
> If so, how can I get around it? How does rdkit deal with empty data?
>
> This is a very common question and I apologize in advance for this FAQ.
>
> Best regards,
>
> Markus
>
>
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports.http://sdm.link/zohodev2dev
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.http://sdm.link/zohodev2dev
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-07-05 Thread Brian Kelley
After some digging, it looks like the underlying C++ streams aren't
flushing.  This means that python might not actually have all the
information when you print them out.  We may have to enable a "flush"
function for these streams for better error reporting on the python side,
I'll need to investigate this a bit more.

Cheers,
 Brian

On Wed, Jun 29, 2016 at 3:14 PM, DmitriR <xzf...@gmail.com> wrote:

> Brian - Sure. Attached:
>
> RDKit-test-warnings-01.ipynb
> dff.pkl
> screenshot1-warningsPrintToNotebook.pdf
> screenshot2-noWarningsPrint.pdf
>
> This has gotten stranger though. Now sometimes I get no visible output
> (screenshot 2).
>
> The total length of captured warnings still differs run to run, but now I
> noticed that it alternates *imprecisely*; see comment in screenshot2, cell
> 32). When I compared the sets of warnings produced on alternating runs
> (where the difference is substantial: 43k characters vs 38k characters),
> they are different because a large number of warnings do not get produced.
> I don't know what the smaller variations are due to.
>
> Python 3.5.1 :: Anaconda 2.4.0 (x86_64), OSX 10.11.5, jupyter 4.1.0,
> Firefox
>
> Thanks.
> Dmitri
>
>
> > On Jun 29, 2016, at 9:07 AM, Brian Kelley <fustiga...@gmail.com> wrote:
> >
> > Dmitri,
> >   Could you send me the notebook that displays these issues?  I can't
> reproduce them.
> >
> > Thanks,
> >  Brian
> >
>
>
>
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES string from SureChEMBL iPython Notebook Tutorial

2016-06-29 Thread Brian Kelley
Dmitri,
  Could you send me the notebook that displays these issues?  I can't
reproduce them.

Thanks,
 Brian



On Tue, Jun 28, 2016 at 6:25 PM, Brian Kelley <fustiga...@gmail.com> wrote:

> It looks like there may be an issue calling WrapLogs twice.  If you see
> the error messages in the notebook, it's already been called.  Importing
> IPythonConsole does this automatically.
>
> This may be the cause of our confusion.  I'll look into it.
>
> 
> Brian Kelley
>
> On Jun 28, 2016, at 3:54 PM, DmitriR <xzf...@gmail.com> wrote:
>
> Hi Brian,
>
> First off, now I can capture the warnings, so for practical purposes my
> question has been addressed, thank you for helping me get to this point.
>
> Cool trick with StringIO. I can even just do:
>
> Python 3.5.1 :: Anaconda 2.4.0 (x86_64), OSX 10.11.5, jupyter 4.1.0,
> Firefox
>
> ```
> import io
> err = sys.stderr
> sys.stderr = io.StringIO()
>
> # capture errors/warnings
> Chem.MolFromSmiles('C1CC')
>
> msgs = sys.stderr.getvalue()
> sys.stderr = err
> print('Captured', msgs)
>
> # now errors show in the notebook again
> Chem.MolFromSmiles('C1CC')
> ```
>
> ==
>
> However, if you feel like digging a bit deeper, I'm a little confused too
> now :)
>
> What is the scope of WrapLogs() effects? (notebook-wide, or cell?) Or, by
> chance, does it set anything really persistent?
>
> In my prior notebook session, prior to trying WrapLogs() I could already
> see the warnings printed on red background (like in your screenshot, except
> that you have an ERROR msg, not WARNINGs as in my example).
>
> A call to WrapLogs() made warnings apparently disappear from the notebook.
>
> Upon reinitializing the session I could see the warnings on red background
> as before, wrote the code snippet in my prior email, and *without calling
> WrapLogs()* I could capture the warnings with it.
>
> So I assumed that RDKit messages went to the notebook's stderr by default,
> and WrapLogs() did something else.
>
> After getting your last email, I made a minimal test case (new notebook
> with just the RDKit call that generates warnings `dff['InChI'] =
> dff['ROMol'].map(Chem.MolToInchi)`, wrapped inside the stderr capture code
> snippet), killed all python instances, restarted the browser, loaded data
> from pickled dataframe.
>
> Now, *without ever having called WrapLogs()* I still get all RDKit
> warnings go to stderr, and I can still  capture them using the snippet.
> Calls to WrapLogs() now appear to have no effect whatsoever.
>
> If this indicates to you any potential issue, we can look more into it.
> Otherwise I'm good.
>
> ==
>
> The other strange behavior that I described below (the number of warnings
> alternating between successive calls to the same code using
> Chem.MolToInchi) remains though. Maybe it's the underlying InChi code, I
> did not investigate.
>
> Thanks again.
> Dmitri
>
>
>
> On Jun 28, 2016, at 2:14 PM, Brian Kelley <fustiga...@gmail.com> wrote:
>
> Dmitri,
>   I admit to being a bit confused.  What WrapLogs() does is simply
> redirect the C++ errors into python's stderr. See attache png.   I think
> you may have noticed that, as you are capturing with sys.stderr.
>
> These errors are output (at least for me) in the IPython notebook.  I'm
> not sure what is being hidden here.  Perhaps the notebook has changed
> somehow?  Here is my version:
>
> Python 2.7.11 |Anaconda 2.1.0 (x86_64)| (default, Dec  6 2015, 18:57:58)
> Type "copyright", "credits" or "license" for more information.
>
> IPython 4.0.0 -- An enhanced Interactive Python.
>
>
> btw - you can use StringIO as opposed to a file
>
> from StringIO import StringIO
>
> err = sys.stderr
> io = sys.stderr = StringIO()
> 
> sys.stderr = err
> print io.getvalue()
>
>
>
> On Tue, Jun 28, 2016 at 1:24 PM, DmitriR <xzf...@gmail.com> wrote:
> Brian - Thank you!
>
> (on OSX 10.11.5, jupyter 4.1.0)
>
> rdkit.Chem.WrapLogs() does hide the messages.
> I could not figure out how to access them though once they are hidden.
>
> To capture warnings, this mechanism seems to work - but it is ugly.
>
> ```
> import os
> ## switch the streams
> stderr_fn = 'stderr.log'
> orig_stderr = sys.stderr
> sys.stderr = open(stderr_fn, 'w')
>
> ## RDKit code producing warnings goes here
>
> ## switch back stderr, process the warnings
> sys.stderr.flush()
> sys.stderr = orig_stderr
> with open(stderr_fn, 'r') as f: err_data = f.read()
> os.remove(stderr_fn)
> print(len(err_data))
> ```
>
> Assuming it is all even necessary, this could be made much nicer by using
>

Re: [Rdkit-discuss] Error compiling RDKit with MinGW on Windows

2016-06-23 Thread Brian Kelley
Mingw apparently doesn't support xlocale.  There are a few issues floating 
around when doing a brief web search.  One option is to simply revert back to 
the older code for mingw, however we also may need to disable the regression 
test in this case.   I can supply a patch for this once I get mingw installed.


Brian Kelley

> On Jun 23, 2016, at 1:47 PM, Paolo Tosco <paolo.to...@unito.it> wrote:
> 
> Dear Anne,
> 
> tonight I'll be working on getting a working RDKit build using MinGW 
> compilers; I'll update you as soon as I have finished.
> 
> Best,
> Paolo
> 
>> On 23/06/2016 13:29, Anne wrote:
>> Hi everyone,
>> 
>> I have been trying to compile RDKit (rdkit-Release_2016_03_2.tar.gz)
>> using CodeBlocks 16.01 with the MinGW (gcc 4.9) compiler on Win7. But I
>> keep getting the following error in the LocaleSwitcher.cpp:
>> 
>> []
>> 
>> [  1%] Building CXX object
>> Code/RDGeneral/CMakeFiles/RDGeneral.dir/Dict.cpp.obj
>> cd /d D:\Programme\RDKit\rdkit_2\build\Code\RDGeneral &&
>> C:\PROGRA~2\CODEBL~1\MinGW\bin\G__~1.EXE   -DRDGeneral_EXPORTS
>> -DRDK_32BIT_BUILD -DRDK_TEST_MULTITHREADED
>> -DRDK_USE_STRICT_ROTOR_DEFINITION
>> @CMakeFiles/RDGeneral.dir/includes_CXX.rsp -mpopcnt -Wno-deprecated
>> -Wno-unused-function -fno-strict-aliasing -fPIC -Wall -Wextra -O3
>> -DNDEBUG   -DRDK_THREADSAFE_SSS -DBOOST_ALL_NO_LIB -o
>> CMakeFiles\RDGeneral.dir\Dict.cpp.obj -c
>> D:\Programme\RDKit\rdkit_2\Code\RDGeneral\Dict.cpp
>> D:\Programme\RDKit\rdkit_2\Code\RDGeneral\Dict.cpp:1:0: warning: -fPIC
>> ignored for target (all code is position independent)
>>   // $Id$
>>   ^
>> D:\Programme\RDKit\rdkit_2\Code\RDGeneral\LocaleSwitcher.cpp:1:0:
>> warning: -fPIC ignored for target (all code is position independent)
>>   //  Copyright (c) 2016, Novartis Institutes for BioMedical Research Inc.
>>   ^
>> D:\Programme\RDKit\rdkit_2\Code\RDGeneral\LocaleSwitcher.cpp:36:21:
>> fatal error: xlocale.h: No such file or directory
>>   #include 
>>   ^
>> compilation terminated.
>> mingw32-make.exe[2]: ***
>> [Code/RDGeneral/CMakeFiles/RDGeneral.dir/LocaleSwitcher.cpp.obj] Error 1
>> mingw32-make.exe[1]: *** [Code/RDGeneral/CMakeFiles/RDGeneral.dir/all]
>> Error 2
>> mingw32-make.exe: *** [all] Error 2
>> [  1%] Building CXX object
>> Code/RDGeneral/CMakeFiles/RDGeneral.dir/LocaleSwitcher.cpp.obj
>> cd /d D:\Programme\RDKit\rdkit_2\build\Code\RDGeneral &&
>> C:\PROGRA~2\CODEBL~1\MinGW\bin\G__~1.EXE   -DRDGeneral_EXPORTS
>> -DRDK_32BIT_BUILD -DRDK_TEST_MULTITHREADED
>> -DRDK_USE_STRICT_ROTOR_DEFINITION
>> @CMakeFiles/RDGeneral.dir/includes_CXX.rsp -mpopcnt -Wno-deprecated
>> -Wno-unused-function -fno-strict-aliasing -fPIC -Wall -Wextra -O3
>> -DNDEBUG   -DRDK_THREADSAFE_SSS -DBOOST_ALL_NO_LIB -o
>> CMakeFiles\RDGeneral.dir\LocaleSwitcher.cpp.obj -c
>> D:\Programme\RDKit\rdkit_2\Code\RDGeneral\LocaleSwitcher.cpp
>> Code\RDGeneral\CMakeFiles\RDGeneral.dir\build.make:187: recipe for
>> target 'Code/RDGeneral/CMakeFiles/RDGeneral.dir/LocaleSwitcher.cpp.obj'
>> failed
>> mingw32-make.exe[2]: Leaving directory 'D:/Programme/RDKit/rdkit_2/build'
>> CMakeFiles\Makefile2:279: recipe for target
>> 'Code/RDGeneral/CMakeFiles/RDGeneral.dir/all' failed
>> mingw32-make.exe[1]: Leaving directory 'D:/Programme/RDKit/rdkit_2/build'
>> D:/Programme/RDKit/rdkit_2/build/Makefile:159: recipe for target 'all'
>> failed
>> Process terminated with status 2 (0 minute(s), 17 second(s))
>> 4 error(s), 6 warning(s) (0 minute(s), 17 second(s))
>> 
>> 
>> Since the error says that xlocale.h is missing, I downloaded and tried
>> do add that header-file to the mingw include directory but this just
>> lead to different (missing file) errors. So there seems to be a more
>> general problem . I'm not sure if it occurs due to a linking error or if
>> I'm doing something else wrong. Since I am quite new to programming it
>> might be something obvious which I am not aware of.
>> 
>> To get the build recipe I used Cmake. The configuration and generation
>> of the CodeBlocks-project file worked fine. I also compiled the boost
>> library that way using the same compiler and it seems to work, too.
>> 
>> Any ideas about what I am doing wrong would be greatly appreciated.
>> 
>> 
>> Best Regards,
>> 
>> Anne
>> 
>> 
>> --
>> Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
>> Francisco, CA to explore cut

Re: [Rdkit-discuss] Chirality conservation during atom replacement

2016-06-22 Thread Brian Kelley
Christian, 
 I believe this was a bug fix where smarts chirality wasn't respected with 
dummy atoms.  We fixed this during an investigation about r-group 
decomposition, basically stereo cores were behaving oddly.


Brian Kelley

> On Jun 22, 2016, at 10:30 AM, Kramer, Christian <christian.kra...@roche.com> 
> wrote:
> 
> Hi Andrew and Greg,
> 
> thanks a lot for the quick replies.
> 
> I tested Greg's suggested solution, and it works ... but only with the RDKit 
> version 2016.03.1. With Version 2015.09.2, I still get the wrong 
> stereochemistry after fragmentation (maybe relevant for people working with 
> older versions).
> 
> Bests,
> Christian
> 
> 
> Dr. Christian Kramer
> Computer-Aided Drug Design (CADD) Scientist
> 
> F. Hoffmann-La Roche Ltd
> Pharma Research and Early Development
> Bldg. 092/2.56
> CH-4070 Basel
> 
> Phone +41 61 682 2471
> mailto: christian.kra...@roche.com
> 
> Confidentiality Note: This message is intended only for the use of the named 
> recipient(s) and may contain confidential and/or proprietary information. If 
> you are not the intended recipient, please contact the sender and delete this 
> message. Any unauthorized use of the information contained in this message is 
> prohibited.
> 
>> On Tue, Jun 21, 2016 at 5:54 PM, Andrew Dalke <da...@dalkescientific.com> 
>> wrote:
>> On Jun 21, 2016, at 5:26 PM, Greg Landrum wrote:
>> > Because chirality is represented relative to the ordering of the bonds 
>> > around an atom, it's pretty difficult to do this if you want to actually 
>> > break and add bonds on your own. This would probably be somewhat easier if 
>> > there were an RWMol.ReplaceBond() method analogous to the 
>> > RWMol.ReplaceAtom() method, but that's not available at the moment.
>>...
>> > p.s. this all reminds me that there's a long email from Andrew on this 
>> > topic that I still haven't worked my way all the way through. 
>> 
>> I pretty much had to give up with working in molecule space and switch to 
>> working in SMILES space.
>> 
>> That is, I did a SMARTS match or whatever to get the atom to change, 
>> backtracked to the original SMILES, which I tokenized to find the 
>> corresponding term, then at the token level substituted in the new group.
>> 
>> What made it relatively simple was that I wanted to fragment R-groups along 
>> non-ring single bonds. In that case, I find the pair of atoms (i, j), along 
>> the bond, find the token corresponding to atom j, insert "*.*" before that 
>> token, and reparse the modified SMILES.
>> 
>> On Jun 21, 2016, at 4:50 PM, Kramer, Christian wrote:
>> > Is there a simple way of preserving chirality during splits on chiral 
>> > atoms?
>> 
>> To preserve chirality, I had to map from the new molecule space back to the 
>> original molecule space, bearing in mind the newly added atom. Then figure 
>> out which chiralities were missing in the "*.*"-inserted molecule (since an 
>> asymmetric molecule with chirality in the core might have a symmetric core 
>> after fragmentation, causing chirality information to disappear), and 
>> determine which chirality terms to put back.
>> 
>> It's very tricky.
>> 
>> I was hoping to present this as the RDKit User's Group Meeting, but I won't 
>> be able to make it. :(
>> 
>> 
>> Andrew
>> da...@dalkescientific.com
>> 
>> 
>> --
>> Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
>> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
>> present their vision of the future. This family event has something for
>> everyone, including kids. Get more information and register today.
>> http://sdm.link/attshape
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge

Re: [Rdkit-discuss] Querying when using CTabs

2016-06-07 Thread Brian Kelley
I also like the term "FuzzyBonds" better than vague bonds if we get to
rename it :)

Cheers,
 Brian

On Tue, Jun 7, 2016 at 3:21 PM, Brian Kelley <fustiga...@gmail.com> wrote:

> I was also thinking that instead of protonating, we could just "and" with
> a heavy degree query with the degree equal to the current degree.  This
> should have the same effect, correct?
>
> Cheers,
>  Brian
>
> On Tue, Jun 7, 2016 at 12:37 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> And while we're at it we should think about including options like what
>> ChemAxon calls "Vague Bond Search" (
>> https://docs.chemaxon.com/pages/viewpage.action?pageId=22217121#Tautomersearch/Vaguebondsearch/sp-Hybridization-vaguebond).
>> This would help address some of the aromaticity problems.
>>
>> On Tue, Jun 7, 2016 at 5:30 AM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> I think that here it's worth, at least initially, ignoring what is
>>> currently possible with the RDKit (and how that's implemented) and instead
>>> thinking about what we want to be able to do.[1]
>>>
>>> The goal, I think, is to have some options allowing control over how a
>>> query coming from a MOL block/CTAB actually matches target molecules. One
>>> possible model for this would be to look at the options that were available
>>> for searching in systems like ISIS/Host and ISIS/Base (and whatever it is
>>> that they are now called). I no longer have access to those, but I would
>>> guess that someone in the community may or that some googling will turn up
>>> documentation describing/showing the options. I remember there being
>>> options like: "search as drawn", "allow/disallow substitution at
>>> heteroatoms", "allow substitution everywhere", etc. This may be a good
>>> starting point, then we can think about what kind of options we want to add
>>> for interpreting "R" groups or Hs that have been explicitly added to the
>>> drawing.
>>>
>>> Does the thought make sense to you guys? Does anyone have access
>>> to/remember better what those search options are?
>>>
>>> -greg
>>> [1] all the while keeping somewhere in mind that the core of the RDKit
>>> is really using a more "Daylight-like" model and that there is almost
>>> certainly going to be some mismatch with the MDL model... but we'll worry
>>> about that when we get there.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 7:04 PM, Brian Kelley <fustiga...@gmail.com>
>>> wrote:
>>>
>>>> An interesting conversation came up at work a few days ago regarding
>>>> MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
>>>> tackling some of these issues when it comes to reaction processing .rxn
>>>> based files and plan on contributing it relatively soon.  However, I hadn't
>>>> considered making it a generic Query based sanitization/processing.
>>>>
>>>>
>>>> The basic question was "How do I get a MolBlock to only match the "R"'s
>>>> and not allow substitutions anywhere else? like ChemAxon..."
>>>>
>>>>
>>>> As it turns out, RDKit is very strict when it looks at RGroups.  This
>>>> was the initial issue with when i started Sanitizing RGroups.  Basically
>>>> there are several variants in the wild (ChemDraw/ICM) that make reactions
>>>> that don't quite follow the CTAB spec.  RDKit likes the atom labled R to
>>>> (1) actually be in an "M  RGP" tag and (2) have an atom mapping.  If an
>>>> atom is labeled "R" and not in a R_GRP it isn't considered a wild card for
>>>> instance.
>>>>
>>>> Now queries don't really care about "M  RGP", but they do care that it
>>>> isn't a dummy atom.  I'm listing below our current technique to fix these
>>>> issues for CTAB queries and would like some feedback.
>>>>
>>>> Here is the workflow that we have been telling chemists during
>>>> sketching:
>>>>
>>>> 1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough,
>>>> you can replace it with "A", but R has special semantics and needs an
>>>> RGroup label defined.
>>>> 2. aromatize where appropriate
>>>> 3. (optionally) protonate so only RGroups can match
>>>>
>>>> These line up with 

Re: [Rdkit-discuss] Querying when using CTabs

2016-06-07 Thread Brian Kelley
I was also thinking that instead of protonating, we could just "and" with a
heavy degree query with the degree equal to the current degree.  This
should have the same effect, correct?

Cheers,
 Brian

On Tue, Jun 7, 2016 at 12:37 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> And while we're at it we should think about including options like what
> ChemAxon calls "Vague Bond Search" (
> https://docs.chemaxon.com/pages/viewpage.action?pageId=22217121#Tautomersearch/Vaguebondsearch/sp-Hybridization-vaguebond).
> This would help address some of the aromaticity problems.
>
> On Tue, Jun 7, 2016 at 5:30 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> I think that here it's worth, at least initially, ignoring what is
>> currently possible with the RDKit (and how that's implemented) and instead
>> thinking about what we want to be able to do.[1]
>>
>> The goal, I think, is to have some options allowing control over how a
>> query coming from a MOL block/CTAB actually matches target molecules. One
>> possible model for this would be to look at the options that were available
>> for searching in systems like ISIS/Host and ISIS/Base (and whatever it is
>> that they are now called). I no longer have access to those, but I would
>> guess that someone in the community may or that some googling will turn up
>> documentation describing/showing the options. I remember there being
>> options like: "search as drawn", "allow/disallow substitution at
>> heteroatoms", "allow substitution everywhere", etc. This may be a good
>> starting point, then we can think about what kind of options we want to add
>> for interpreting "R" groups or Hs that have been explicitly added to the
>> drawing.
>>
>> Does the thought make sense to you guys? Does anyone have access
>> to/remember better what those search options are?
>>
>> -greg
>> [1] all the while keeping somewhere in mind that the core of the RDKit is
>> really using a more "Daylight-like" model and that there is almost
>> certainly going to be some mismatch with the MDL model... but we'll worry
>> about that when we get there.
>>
>>
>>
>> On Mon, Jun 6, 2016 at 7:04 PM, Brian Kelley <fustiga...@gmail.com>
>> wrote:
>>
>>> An interesting conversation came up at work a few days ago regarding
>>> MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
>>> tackling some of these issues when it comes to reaction processing .rxn
>>> based files and plan on contributing it relatively soon.  However, I hadn't
>>> considered making it a generic Query based sanitization/processing.
>>>
>>>
>>> The basic question was "How do I get a MolBlock to only match the "R"'s
>>> and not allow substitutions anywhere else? like ChemAxon..."
>>>
>>>
>>> As it turns out, RDKit is very strict when it looks at RGroups.  This
>>> was the initial issue with when i started Sanitizing RGroups.  Basically
>>> there are several variants in the wild (ChemDraw/ICM) that make reactions
>>> that don't quite follow the CTAB spec.  RDKit likes the atom labled R to
>>> (1) actually be in an "M  RGP" tag and (2) have an atom mapping.  If an
>>> atom is labeled "R" and not in a R_GRP it isn't considered a wild card for
>>> instance.
>>>
>>> Now queries don't really care about "M  RGP", but they do care that it
>>> isn't a dummy atom.  I'm listing below our current technique to fix these
>>> issues for CTAB queries and would like some feedback.
>>>
>>> Here is the workflow that we have been telling chemists during sketching:
>>>
>>> 1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough,
>>> you can replace it with "A", but R has special semantics and needs an
>>> RGroup label defined.
>>> 2. aromatize where appropriate
>>> 3. (optionally) protonate so only RGroups can match
>>>
>>> These line up with the following RDKit code snippets:
>>>
>>> 1. Fix the "R"s (note we probably should make proper RGroups, but this
>>> just add dummy matches)
>>>
>>> qmol = rdkit.Chem.MolFromMolblock(molblock)
>>> # first, change the "R"'s into matching any atoms
>>> from rdkit.Chem import rdqueries
>>> qmol = Chem.RWMol(qmol)
>>> for atom in newpat.GetAtoms():
>>> if atom.GetAtomicNum() == 0:
>>>qmol.ReplaceAtom(atom.GetIdx(

[Rdkit-discuss] Querying when using CTabs

2016-06-06 Thread Brian Kelley
An interesting conversation came up at work a few days ago regarding
MolBlocks/CTABs with queries that behave in an unexpected manner.  I'm
tackling some of these issues when it comes to reaction processing .rxn
based files and plan on contributing it relatively soon.  However, I hadn't
considered making it a generic Query based sanitization/processing.


The basic question was "How do I get a MolBlock to only match the "R"'s and
not allow substitutions anywhere else? like ChemAxon..."


As it turns out, RDKit is very strict when it looks at RGroups.  This was
the initial issue with when i started Sanitizing RGroups.  Basically there
are several variants in the wild (ChemDraw/ICM) that make reactions that
don't quite follow the CTAB spec.  RDKit likes the atom labled R to (1)
actually be in an "M  RGP" tag and (2) have an atom mapping.  If an atom is
labeled "R" and not in a R_GRP it isn't considered a wild card for instance.

Now queries don't really care about "M  RGP", but they do care that it
isn't a dummy atom.  I'm listing below our current technique to fix these
issues for CTAB queries and would like some feedback.

Here is the workflow that we have been telling chemists during sketching:

1. Make a proper group.  The marvin-sketch/Chemdraw "R" is not enough, you
can replace it with "A", but R has special semantics and needs an RGroup
label defined.
2. aromatize where appropriate
3. (optionally) protonate so only RGroups can match

These line up with the following RDKit code snippets:

1. Fix the "R"s (note we probably should make proper RGroups, but this just
add dummy matches)

qmol = rdkit.Chem.MolFromMolblock(molblock)
# first, change the "R"'s into matching any atoms
from rdkit.Chem import rdqueries
qmol = Chem.RWMol(qmol)
for atom in newpat.GetAtoms():
if atom.GetAtomicNum() == 0:
   qmol.ReplaceAtom(atom.GetIdx(), rdqueries.AtomNumGreaterQueryAtom(0))


2. aromatize - this might be good or might break things.  It seems to work
great, even with conditional logic i.e. [C,O] but I'm unsure which atom is
actually being used to form the Pi electrons for aromaticity checking.  I
expect the First actually.  In anycase, something needs to happen in
general for random inputs, otherwise the matching doesn't really do what is
expected.

# We want to see if we can find aromaticity, this may be complicated with
#  query features [C,O] but it works ok.
Chem.SanitizeMol(qmol, Chem.SANITIZE_SETAROMATICITY)

3. protonate if the desire is to only match RGroups

# second, add explicit Hs so we only match the Rs
# I'm unclear if this can fail in general, I would probably wrap this in
#  a try...except block
Chem.SanitizeMol(qmol, Chem.SANITIZE_ADJUSTHS)
qmol = Chem.MergeQueryHs(Chem.AddHs(qmol))

This could be enabled with flags into a SanitizeQuery function, or perhaps
a PrepareQuery function.

Thoughts?

Cheers,
 Brian
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] two versions of RDKit in one python program - static compilation?

2016-04-28 Thread Brian Kelley
In general this is not possible.  However, you could run a python server in one 
environment and call it via some RPC mechanism from the other ( using pickled 
molecules or smiles string or some other serialization method )

This might put you in the right track:

http://stackoverflow.com/questions/1879971/what-is-the-current-choice-for-doing-rpc-in-python


Brian Kelley

> On Apr 28, 2016, at 7:13 PM, Rafal Roszak <rmrmg.c...@gmail.com> wrote:
> 
> Hello,
> 
> I have python program which uses RDKit and: some functions in the
> program require old version of RDKit (don't ask me why), and some other
> (my code) in the program requires current version of RDKit.
> My problem is how to combine two different versions of RDKit in one
> python program. 
> I instaled new version of RDKit in other place, renamed directory with
> python package (to rdkitnew, without touching old version). This
> _almost_ works - I have problem with shared library:
> 
> ImportError: [...] rdBase.so: undefined symbol: _ZN5RDKit12boostVersionE
> 
> This is because new version use library from old RDkit. This is  the
> place where I stuck. I cannot just add new directory to LD_LIBRARY_PATH
> this will cause problem in code which uses old RDKit. I thought to set
> LD_LIBRARY_PATH directly before import new version of RDKit using
> os.environ but it does not work properly (only one version of shared
> library are in memory this which is load earlier). So my idea is to
> build new version of RDKit without dynamic (shared) library. 
> Is it possible? How to do this?
> Do you have any other (better) idea how to fix this problem?
> 
> 
> Regards, 
> 
> Rafal
> 
> --
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Ubuntu 14.04 LTS Build Post Commit 7478e3fd3bee0c20291c2b99776c40a0c7d8a955

2016-04-17 Thread Brian Kelley
Now that we made thread safe the default, you also need boost-system ( and 
boost-serialization for FilterCatalog serialization)
Here is the apt get we use to configure docker files.
apt install -y wget flex bison build-essential python-numpy cmake python-dev 
sqlite3 libsqlite3-dev libboost-dev libboost-system-dev libboost-thread-dev 
libboost-serialization-dev libboost-python-dev libboost-regex-dev; 
It looks like I should add a PR to detect this at build time?

I admit to being confused why this builds but doesn't run...

See here for more details:

https://github.com/rdkit/rdkit/issues/762

Brian Kelley

> On Apr 17, 2016, at 8:28 AM, Tim Dudgeon <tdudgeon...@gmail.com> wrote:
> 
> I've hit the same issue. Any thoughts on what the underlying issue is 
> (without reverting to using anaconda)?
> An example that illustrates this is here:
> https://github.com/InformaticsMatters/rdkit
> 
> Tim
> 
>> On 20/11/2015 16:02, Greg Landrum wrote:
>> Hi Huw,
>> 
>> This is not directly responsive to your question, but if you're working with 
>> anaconda, it is probably easier to just build and install the RDKit using 
>> the conda-rdkit recipes here:
>> https://github.com/rdkit/conda-rdkit
>> The development branch there pulls from master.
>> 
>> -greg
>> 
>> 
>>> On Fri, Nov 20, 2015 at 4:26 PM, Huw Jones <huwdjo...@hotmail.com> wrote:
>>> Hi there,
>>> 
>>> I’ve been building the Python RDKit modules direct from the GitHub 
>>> repository recently on an Ubuntu 14.04 LTS VM (i.e. git clone 
>>> https://github.com/rdkit/rdkit.git).
>>> 
>>> I use the following aptitude packages for the build process:
>>> 
>>> flex2.5.35-10.1ubuntu2
>>> bison 2:3.0.2.dfsg-2
>>> build-essential 11.6ubuntu6
>>> cmake 2.8.12.2-0ubuntu3
>>> libboost-dev 1.54.0.1ubuntu1
>>> libboost-regex-dev 1.54.0.1ubuntu1
>>> python-dev 2.7.5-5ubuntu3
>>> libboost-python-dev 1.54.0.1ubuntu1
>>> python-numpy 1:1.8.2-0ubuntu0.1
>>> 
>>> The build process is as follows:
>>> 
>>> git clone https://github.com/rdkit/rdkit.git rdkit-latest
>>> source /opt/virtualenv/current/bin/activate
>>> export RDBASE=/path/to/rdkit-latest
>>> cd rdkit-latest
>>> mkdir build
>>> cd build
>>> cmake ..
>>> make -j 2
>>> make install
>>> cd ..
>>> cp -r rdkit /opt/virtualenv/current/lib/python2.7/site-packages
>>> mkdir /opt/virtualenv/current/lib/python2.7/lib-rdkit
>>> cp lib/* /opt/virtualenv/current/lib/python2.7/lib-rdkit
>>> 
>>> I have recently been experiencing this error:
>>> 
>>> Python 2.7.10 (default, Nov 20 2015, 07:14:39)
>>> [GCC 4.8.4] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> from rdkit import rdBase
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>> ImportError: 
>>> /opt/virtualenv/current/lib/python2.7/lib-rdkit/libRDBoost.so.1: undefined 
>>> symbol: _ZN5boost6python23throw_error_already_setEv
>>> 
>>> It looks like this error starts appearing from this commit onwards:
>>> 
>>> https://github.com/rdkit/rdkit/commit/9965702691b039c936c3fcca579fbd9ded8f5331
>>> 
>>> Any ideas?
>>> 
>>> Many thanks as ever.
>>> 
>>> Huw Jones
>>> 
>>> --
>>> 
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> 
>> 
>> 
>> --
>> 
>> 
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit build issues

2016-04-15 Thread Brian Kelley
Getting the right version of boost can be tricky.  You can see our normal cmake 
incantation here as well as how we set RDBASE for tests

https://github.com/rdkit/rdkit/blob/master/.travis.yml

Note the 

-D Boost_NO_SYSTEM_PATHS=ON

When running cmake, otherwise cmake can get very confused.



Brian Kelley

> On Apr 15, 2016, at 6:26 PM, Matthew Lardy <mla...@gmail.com> wrote:
> 
> I'll add, that remembering that cmake and ccmake can produce different 
> outcomes I've gone back to trying cmake.  But I can't overwrite the variables 
> in cmake.  Here are the results from trying to specify them:
> 
>  Manually-specified variables were not used by the project:
> 
> BOOST_DIR
> BOOST_INCLUDE_DIR
> BOOST_LIBRARY_DIR
> BOOST_PYTHON_LIBRARY_DEBUG
> BOOST_PYTHON_LIBRARY_RELEASE
> BOOST_REGEX_LIBRARY_DEBUG
> BOOST_REGEX_LIBRARY_RELEASE
> BOOST_SERIALIZATION_LIBRARY_DEBUG
> BOOST_SERIALIZATION_LIBRARY_RELEASE
> BOOST_SYSTEM_LIBRARY_DEBUG
> BOOST_SYSTEM_LIBRARY_RELEASE
> BOOST_THREAD_LIBRARY_DEBUG
> BOOST_THREAD_LIBRARY_RELEASE
> 
> cmake, as per it's usual, picked up an ancient version of boost (which I want 
> to override).  I can get around this with ccmake, but nothing that I compile 
> can pass all of the tests.  If someone knows what these variables (shown in 
> ccmake) are called, I'd love to know.
> 
> Thanks in advance!
> Matthew
> 
> 
>> On Fri, Apr 15, 2016 at 2:58 PM, Matthew Lardy <mla...@gmail.com> wrote:
>> Hi all,
>> 
>> If someone has an insight I would love to hear it about how best to build 
>> RDKit from scratch.  
>> 
>> I am using the following to build the 2015.03 release:
>> RedHat Linux version 6.4
>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
>> swig version 3.0.8
>> cmake version 3.0.0
>> boost version 1.53.0 (I've also tried boost 1.59.0)
>> python version 2.7.7
>> 
>> I use ccmake and alter the things which need to be altered (as cmake always 
>> grabs the wrong things for me).  My build compiles nicely, and without 
>> errors.  But the tests are a mess.  Here is the summary of my build:
>> 
>> 26% tests passed, 87 tests failed out of 118
>> 
>> I've put the details of the tests, in the hope that someone sees a pattern 
>> that I do not, below.
>> 
>> Before someone recommends a clean install that is not possible.  I also do 
>> not have root, so I can't just wipe the machine and start over.  I cannot 
>> use an RPM, so that is out too.  Any ideas are welcome!
>> 
>> Thanks,
>> Matthew
>> 
>> The following tests FAILED:
>>   2 - testDataStructs (OTHER_FAULT)
>>   3 - pyBV (Failed)
>>   4 - pyDiscreteValueVect (Failed)
>>   5 - pySparseIntVect (Failed)
>>   7 - testGrid (OTHER_FAULT)
>>   8 - testPyGeometry (Failed)
>>  11 - pyAlignment (Failed)
>>  14 - testMMFFForceField (OTHER_FAULT)
>>  15 - pyForceFieldConstraints (Failed)
>>  17 - pyDistGeom (Failed)
>>  18 - graphmolTest1 (OTHER_FAULT)
>>  21 - graphmolMolOpsTest (SEGFAULT)
>>  23 - graphmoltestChirality (OTHER_FAULT)
>>  24 - graphmoltestPickler (OTHER_FAULT)
>>  26 - hanoiTest (OTHER_FAULT)
>>  28 - testDepictor (OTHER_FAULT)
>>  29 - pyDepictor (Failed)
>>  32 - fileParsersTest1 (OTHER_FAULT)
>>  33 - testMolSupplier (OTHER_FAULT)
>>  34 - testMolWriter (OTHER_FAULT)
>>  35 - testTplParser (OTHER_FAULT)
>>  36 - testMol2ToMol (OTHER_FAULT)
>>  38 - testReaction (OTHER_FAULT)
>>  40 - pyChemReactions (Failed)
>>  41 - testChemTransforms (OTHER_FAULT)
>>  44 - testFragCatalog (OTHER_FAULT)
>>  45 - pyFragCatalog (Failed)
>>  46 - testDescriptors (OTHER_FAULT)
>>  47 - pyMolDescriptors (Failed)
>>  48 - testFingerprints (OTHER_FAULT)
>>  50 - pyPartialCharges (Failed)
>>  51 - testMolTransforms (OTHER_FAULT)
>>  52 - pyMolTransforms (Failed)
>>  53 - testMMFFForceFieldHelpers (OTHER_FAULT)
>>  54 - testUFFForceFieldHelpers (OTHER_FAULT)
>>  55 - pyForceFieldHelpers (Failed)
>>  56 - testDistGeomHelpers (OTHER_FAULT)
>>  57 - pyDistGeom (Failed)
>>  58 - testMolAlign (OTHER_FAULT)
>>  59 - pyMolAlign (Failed)
>>  60 - testFeatures (OTHER_FAULT)
>>  61 - pyChemicalFeatures (Failed)
>>  

Re: [Rdkit-discuss] RDKit build issues

2016-04-15 Thread Brian Kelley
This looks suspiciously like you haven't set RDBASE to point to the source 
directory when running the tests.  The tests need to find the data directory 
with the test data.


Brian Kelley

> On Apr 15, 2016, at 6:26 PM, Matthew Lardy <mla...@gmail.com> wrote:
> 
> I'll add, that remembering that cmake and ccmake can produce different 
> outcomes I've gone back to trying cmake.  But I can't overwrite the variables 
> in cmake.  Here are the results from trying to specify them:
> 
>  Manually-specified variables were not used by the project:
> 
> BOOST_DIR
> BOOST_INCLUDE_DIR
> BOOST_LIBRARY_DIR
> BOOST_PYTHON_LIBRARY_DEBUG
> BOOST_PYTHON_LIBRARY_RELEASE
> BOOST_REGEX_LIBRARY_DEBUG
> BOOST_REGEX_LIBRARY_RELEASE
> BOOST_SERIALIZATION_LIBRARY_DEBUG
> BOOST_SERIALIZATION_LIBRARY_RELEASE
> BOOST_SYSTEM_LIBRARY_DEBUG
> BOOST_SYSTEM_LIBRARY_RELEASE
> BOOST_THREAD_LIBRARY_DEBUG
> BOOST_THREAD_LIBRARY_RELEASE
> 
> cmake, as per it's usual, picked up an ancient version of boost (which I want 
> to override).  I can get around this with ccmake, but nothing that I compile 
> can pass all of the tests.  If someone knows what these variables (shown in 
> ccmake) are called, I'd love to know.
> 
> Thanks in advance!
> Matthew
> 
> 
>> On Fri, Apr 15, 2016 at 2:58 PM, Matthew Lardy <mla...@gmail.com> wrote:
>> Hi all,
>> 
>> If someone has an insight I would love to hear it about how best to build 
>> RDKit from scratch.  
>> 
>> I am using the following to build the 2015.03 release:
>> RedHat Linux version 6.4
>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
>> swig version 3.0.8
>> cmake version 3.0.0
>> boost version 1.53.0 (I've also tried boost 1.59.0)
>> python version 2.7.7
>> 
>> I use ccmake and alter the things which need to be altered (as cmake always 
>> grabs the wrong things for me).  My build compiles nicely, and without 
>> errors.  But the tests are a mess.  Here is the summary of my build:
>> 
>> 26% tests passed, 87 tests failed out of 118
>> 
>> I've put the details of the tests, in the hope that someone sees a pattern 
>> that I do not, below.
>> 
>> Before someone recommends a clean install that is not possible.  I also do 
>> not have root, so I can't just wipe the machine and start over.  I cannot 
>> use an RPM, so that is out too.  Any ideas are welcome!
>> 
>> Thanks,
>> Matthew
>> 
>> The following tests FAILED:
>>   2 - testDataStructs (OTHER_FAULT)
>>   3 - pyBV (Failed)
>>   4 - pyDiscreteValueVect (Failed)
>>   5 - pySparseIntVect (Failed)
>>   7 - testGrid (OTHER_FAULT)
>>   8 - testPyGeometry (Failed)
>>  11 - pyAlignment (Failed)
>>  14 - testMMFFForceField (OTHER_FAULT)
>>  15 - pyForceFieldConstraints (Failed)
>>  17 - pyDistGeom (Failed)
>>  18 - graphmolTest1 (OTHER_FAULT)
>>  21 - graphmolMolOpsTest (SEGFAULT)
>>  23 - graphmoltestChirality (OTHER_FAULT)
>>  24 - graphmoltestPickler (OTHER_FAULT)
>>  26 - hanoiTest (OTHER_FAULT)
>>  28 - testDepictor (OTHER_FAULT)
>>  29 - pyDepictor (Failed)
>>  32 - fileParsersTest1 (OTHER_FAULT)
>>  33 - testMolSupplier (OTHER_FAULT)
>>  34 - testMolWriter (OTHER_FAULT)
>>  35 - testTplParser (OTHER_FAULT)
>>  36 - testMol2ToMol (OTHER_FAULT)
>>  38 - testReaction (OTHER_FAULT)
>>  40 - pyChemReactions (Failed)
>>  41 - testChemTransforms (OTHER_FAULT)
>>  44 - testFragCatalog (OTHER_FAULT)
>>  45 - pyFragCatalog (Failed)
>>  46 - testDescriptors (OTHER_FAULT)
>>  47 - pyMolDescriptors (Failed)
>>  48 - testFingerprints (OTHER_FAULT)
>>  50 - pyPartialCharges (Failed)
>>  51 - testMolTransforms (OTHER_FAULT)
>>  52 - pyMolTransforms (Failed)
>>  53 - testMMFFForceFieldHelpers (OTHER_FAULT)
>>  54 - testUFFForceFieldHelpers (OTHER_FAULT)
>>  55 - pyForceFieldHelpers (Failed)
>>  56 - testDistGeomHelpers (OTHER_FAULT)
>>  57 - pyDistGeom (Failed)
>>  58 - testMolAlign (OTHER_FAULT)
>>  59 - pyMolAlign (Failed)
>>  60 - testFeatures (OTHER_FAULT)
>>  61 - pyChemicalFeatures (Failed)
>>  62 - testShapeHelpers (OTHER_FAULT)
>>  63 - pyShapeHelpers (Failed)
>>  65 - pyMolCatalog (Fa

Re: [Rdkit-discuss] ring bond query

2016-04-13 Thread Brian Kelley
Not sure if it will help, but the python version of IsInRing checks to see
if the sssr is initialized.  Here is the C++ code that is used, you may be
able to adapt it to your needs:

bool BondIsInRing(const Bond *bond) {
  if (!bond->getOwningMol().getRingInfo()->isInitialized()) {
MolOps::findSSSR(bond->getOwningMol());
  }
  return bond->getOwningMol().getRingInfo()->numBondRings(bond->getIdx())
!= 0;
}


On Wed, Apr 13, 2016 at 11:59 AM, Yingfeng Wang  wrote:

> Paolo,
>
> Thanks. As I mentioned in the end of my question, setting this flag true
> may cause other problems. In the python version, the sanitize flag is also
> false, but it works. Is it possible to figure out a workaround for checking
> whether each bond is in a ring in the C++ version?
>
> Best,
> Yingfeng
>
> On Wed, Apr 13, 2016 at 11:53 AM, Paolo Tosco 
> wrote:
>
>> Dear Yingfeng,
>>
>> the reason why RingInfo is not initialized is that you are invoking
>> SmilesToMol() with the sanitize flag set to false; setting that parameter
>> to true in the SmilesToMol() call should fix your problem.
>>
>> Kind regards,
>> Paolo
>>
>>
>> On 4/13/2016 16:48, Yingfeng Wang wrote:
>>
>> This is my C++ code.
>>
>>
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> using namespace RDKit;
>> using namespace std;
>>
>> int main(int argc, const char * argv[])
>> {
>> RWMol* pCurMol;
>> MolGraph curTopology;
>> vector viiRingBonds;
>> string sSmiles = "Oc1ncnnc1-c1c1";
>> list
>> >::const_iterator itEdge;
>>
>> pCurMol = SmilesToMol(sSmiles, 0, false);
>> curTopology = pCurMol->getTopology();
>> viiRingBonds = pCurMol->getRingInfo()->bondRings();
>> for (itEdge = curTopology.m_edges.begin(); itEdge !=
>> curTopology.m_edges.end(); ++itEdge)
>> if (queryIsBondInRing(itEdge->m_property.get()))
>> cout<<"get one ring bond"<>
>> return 0;
>> }
>>
>> It crashes in the following line
>>
>> queryIsBondInRing(itEdge->m_property.get())
>>
>> with the following information.
>>
>> Pre-condition Violation
>> RingInfo not initialized
>> Violation occurred on line 67 in file
>> /Users/yingfeng/software/RDKit/rdkit-Release_2015_03_1/Code/GraphMol/RingInfo.cpp
>> Failed Expression: df_init
>> 
>>
>> libc++abi.dylib: terminating with uncaught exception of type
>> Invar::Invariant: Pre-condition Violation
>>
>>
>> Actually, the corresponding python version works.
>>
>> >>> from rdkit import Chem
>> >>> sSmiles = "Oc1ncnnc1-c1c1"
>> >>> curMol =Chem.MolFromSmiles(sSmiles, False)
>> >>> iBondsNum = curMol.GetNumBonds()
>> >>> for i in range(iBondsNum):
>> ... current_bond = curMol.GetBondWithIdx(i)
>> ... if current_bond.IsInRing():
>> ... print i
>> ...
>> 1
>> 2
>> 3
>> 4
>> 5
>> 7
>> 8
>> 9
>> 10
>> 11
>> 12
>> 13
>>
>>
>> Please note that the problem cannot be solved by using
>> SmilesToMol(sSmiles, 0, true);
>>
>> because it may cause other problems, e.g.,
>> string sSmiles = "c.c";
>> Therefore, I would like to stay with SmilesToMol(sSmiles, 0, false);
>>
>> Could you please give me some hints for fixing the C++ version, and
>> getting the similar results of the python version.
>>
>>
>> --
>> Find and fix application performance issues faster with Applications Manager
>> Applications Manager provides deep performance insights into multiple tiers 
>> of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free 
>> trial!https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>
>
> --
> Find and fix application performance issues faster with Applications
> Manager
> Applications Manager provides deep performance insights into multiple
> tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] GetMol and GetMolFrags in C++

2016-04-10 Thread Brian Kelley
In C++ you don't have to, RWMol can be sent to any function that takes an ROMol.

Actually, this is true now in Python as well.

In C++ if you really need to copy the molecule:

ROMol mol = new ROMol( *rwmol );

But you really don't have to.


Brian Kelley

> On Apr 10, 2016, at 6:27 PM, Yingfeng Wang <ywang...@gmail.com> wrote:
> 
> In python, I have GetMol() for Chem.EditableMol. I can also use 
> Chem.GetMolFrags(...). Could you please help me to know how to get similar 
> functions in C++?
> 
> For example, I remove two bonds in an RWMol object, how do I use GetMol() and 
> GetMolFrags(...) in C++?
> 
> Thanks.
> 
> Yingfeng
> --
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
> gampad/clk?id=1444514301=/ca-pub-7940484522588532
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
gampad/clk?id=1444514301=/ca-pub-7940484522588532
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] compiling error with C++ on maverick

2016-03-19 Thread Brian Kelley
One thing we could fairly easily set up is a "companion" CMakeLists.txt for
a sample C++ project that builds a test project against a built
distribution.  It could prove useful as a basis for building C++ extensions.

Basically we would hedge our bets and just link against everything :)

Cheers,
 Brian

On Wed, Mar 16, 2016 at 8:47 AM, Yingfeng Wang  wrote:

> Riccardo,
>
> Thank you very much!
>
> Yingfeng
>
> On Wed, Mar 16, 2016 at 4:08 AM, Riccardo Vianello <
> riccardo.viane...@gmail.com> wrote:
>
>> Hi Yingfeng,
>>
>> do you use cmake to build your own project? in case you did, some cmake
>> configuration files are usually installed together with the RDKit libraries
>> and may help manage these details:
>>
>> rdkit-config.cmake
>> rdkit-config-version.cmake
>> rdkit-targets.cmake
>> rdkit-targets-release.cmake
>>
>> The location of these files (matching the libraries installation path in
>> the current RDKit version) should be passed on the cmake command line:
>>
>> $ cmake -D RDKit_DIR=/rdkit/libraries/installation/path/lib [...]
>>
>> And the CMakeLists.txt script can then use the information from these
>> files to configure the build:
>>
>> find_package(RDKit REQUIRED)
>> include_directories(${RDKit_INCLUDE_DIRS})
>> [...]
>> add_library(mylibrary [...])
>> target_link_libraries(mylibrary Descriptors Fingerprints GraphMol)
>>
>> Specifying the most direct dependencies should be usually sufficient, and
>> cmake should be able to complement this information with the target
>> dependencies originally collected during the RDKit build and then tracked
>> in the configuration files.
>>
>> Best,
>> Riccardo
>>
>>
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecular properties + pickling

2016-03-18 Thread Brian Kelley
Maciek, the lack of serializing properties happens due to a quirk of the 
current serialization method.

We are hoping to have a proper solution sometime this year, it is currently a 
work in progress, the current obstacle is making it backwards compatible so old 
versions can still read the new one ( with the same loss of properties ).


Brian Kelley

> On Mar 18, 2016, at 1:04 PM, Maciek Wójcikowski <mac...@wojcikowski.pl> wrote:
> 
> Thanks. Is there a reason that this is not in the main Mol class?
> 
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
> 
> 2016-03-18 17:52 GMT+01:00 Bennion, Brian <benni...@llnl.gov>:
>> This is not a bug nor a feature per se.
>> 
>> You need to set the compounds as PropteryMOl
>> 
>> Ie
>> 
>>  
>> 
>> Ketone = [x for x in supplAlkylKetones if x is not None]
>> 
>>   for i in range(len(Ketone)):
>> 
>>  Ketone[i] = PropertyMol(Ketone[i])
>> 
>>  
>> 
>> From: Maciek Wójcikowski [mailto:mac...@wojcikowski.pl] 
>> Sent: Friday, March 18, 2016 9:35 AM
>> To: RDKit Discuss
>> Subject: [Rdkit-discuss] Molecular properties + pickling
>> 
>>  
>> 
>> Hi all,
>> 
>>  
>> 
>> Is it a bug or am I doing something wrong - the properties are not passed 
>> during pickling in python. Here comes the example:
>> 
>>  
>> 
>> from rdkit import Chem
>> 
>> import cPickle as pickle
>> 
>>  
>> 
>> mol = Chem.MolFromSmiles('c1c1')
>> 
>> mol.SetProp('aaa', '123')
>> 
>> print list(mol.GetPropNames()) # ['aaa']
>> 
>> mol2 = pickle.loads(pickle.dumps(mol))
>> 
>> print list(mol2.GetPropNames()) # ['']
>> 
>>  
>> 
>>  
>> 
>> In [19]: rdkit.__version__
>> 
>> Out[19]: '2015.09.2'
>> 
>>  
>> 
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>> 
>> 
>> --
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Brian Kelley
No, prochiral atoms have the same rank.  Your question got me thinking to
how we could detect prochiral atoms, here is the stupidest/simplest
solution I could come up with, it changes isotopes on every atom until a
new chiral center is added, this atom is considered prochiral:

def numUnspecifiedStereoAtoms(mol):
"""Return the number of unspecified stereo atoms in a molecule"""
return len([atom for atom in mol.GetAtoms() if
("_ChiralityPossible" in atom.GetPropNames() and
 atom.GetChiralTag() ==
Chem.rdchem.ChiralType.CHI_UNSPECIFIED)])

def findProchiral(m):
"""Return indices of prochiral atoms, to find prochiral hydrogens,
hydrogens must appear in the graph, see Chem.AddHs"""
indices = [ atom.GetIdx() for atom in m.GetAtoms() ]
tags = [atom.GetChiralTag() for atom in m.GetAtoms()]
num_unspecified = numUnspecifiedStereoAtoms(m)
prochiral = []
for index in indices:
m2 = Chem.Mol(m)
m2.GetAtomWithIdx(index).SetIsotope(2)
m3 = Chem.MolFromSmiles(Chem.MolToSmiles(m2, isomericSmiles=True))
if numUnspecifiedStereoAtoms(m3) != num_unspecified:
prochiral.append(index)


return prochiral

print findProchiral(Chem.AddHs(Chem.MolFromSmiles("C1C(C(N)=O)=CNC=C1")))

On Thu, Mar 10, 2016 at 11:30 AM, Peter S. Shenkin 
wrote:

> Is the canonical rank of prochiral H's different or the same? (For example
> the rank of the H's on C-1 of ethyl chloride.)
>
> Thanks,
> -P.
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Brian Kelley
Yes, I actually exposed that function to Python  in Rdkit :)

Be aware that the canonical rank and the output order aren't the same thing.  
The rank is what is used during graph traversal, when making the smiles string, 
to choose what atom to go to next.  The output order is what atoms where output 
first, second, third in the output smiles string.  They are not necessarily the 
same.  

Both should, however, be unique for the input graph, but in either case 
explicit hydrogens should be added.

For reference:

order = Chem.CanonicalRankAtoms(m, includeChirality=True)

Is the function being discussed.

And as a bonus:

mol_ordered = Chem.RenumberAtoms(m, list(order))

Will make a copy in canonical atom order, but not canonical smiles output order.


Brian Kelley

> On Mar 10, 2016, at 7:36 AM, Maciek Wójcikowski <mac...@wojcikowski.pl> wrote:
> 
> Hi,
> 
> Few months back Greg has added CanonicalRankAtoms to rdkit.Chem after my 
> similar question.
> http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms
> 
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
> 
> 2016-03-10 13:18 GMT+01:00 Michal Krompiec <michal.kromp...@gmail.com>:
>> Thanks a lot, this is exactly what I wanted.
>> Best regards,
>> Michal
>> 
>>> On 10 March 2016 at 12:13, Brian Kelley <fustiga...@gmail.com> wrote:
>>> The canonicalizer doesn't treat hydrogens any differently than any other 
>>> atom, but they have to be in the graph.  If you are starting from smiles, 
>>> simply add explicit hydrogens, python example below:
>>> 
>>> >>> from rdkit import Chem
>>> >>> m = Chem.MolFromSmiles("CC")
>>> >>> mh = Chem.AddHs(m)
>>> >>> Chem.MolToSmiles(mh)
>>> '[H]C([H])([H])C([H])([H])[H]'
>>> >>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))
>>> # safer non eval version...
>>> >>> order = mh.GetPropsAsDict(includePrivate=True, 
>>>   
>>> includeComputed=True)['_smilesAtomOutputOrder']
>>> >>> list(order)
>>> [2,0,3,4,1,5,6,7]
>>> >>> 
>>> 
>>> Not that the output order is from the context of the output smiles string, 
>>> i.e. order[0] is the index of the original atom index that was the outputs 
>>> first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx
>>> 
>>>> On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec 
>>>> <michal.kromp...@gmail.com> wrote:
>>>> Hello,
>>>> I need a "canonical" method for generating atom indices for a given 
>>>> molecule (with 3D coordinates, so the input is e.g. a mol file), for a 
>>>> molecular descriptor which should be invariant with respect to atom 
>>>> indexing. As I understand, canonical SMILES will give the same atom 
>>>> indices for non-hydrogen atoms, but is there a way in RDKit to generate 
>>>> unique indices for hydrogens as well?
>>>> Best regards,
>>>> Michal
>>>> 
>>>> --
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> 
>> 
>> --
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical atom indexing

2016-03-10 Thread Brian Kelley
The canonicalizer doesn't treat hydrogens any differently than any other
atom, but they have to be in the graph.  If you are starting from smiles,
simply add explicit hydrogens, python example below:

>>> from rdkit import Chem

>>> m = Chem.MolFromSmiles("CC")

>>> mh = Chem.AddHs(m)

>>> Chem.MolToSmiles(mh)

'[H]C([H])([H])C([H])([H])[H]'

>>> order = eval(mh.GetProp("_smilesAtomOutputOrder"))

# safer non eval version...

>>> order = mh.GetPropsAsDict(includePrivate=True,


includeComputed=True)['_smilesAtomOutputOrder']

>>> list(order)

[2,0,3,4,1,5,6,7]

>>>

Not that the output order is from the context of the output smiles string,
i.e. order[0] is the index of the original atom index that was the outputs
first atom and so on.  I.e. order[output_atom_idx] = input_atom_idx

On Thu, Mar 10, 2016 at 6:27 AM, Michal Krompiec 
wrote:

> Hello,
> I need a "canonical" method for generating atom indices for a given
> molecule (with 3D coordinates, so the input is e.g. a mol file), for a
> molecular descriptor which should be invariant with respect to atom
> indexing. As I understand, canonical SMILES will give the same atom indices
> for non-hydrogen atoms, but is there a way in RDKit to generate unique
> indices for hydrogens as well?
> Best regards,
> Michal
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Peptides and proteins in RDkit with Pseudo atoms support

2016-03-01 Thread Brian Kelley
This is indeed really, really nice.  Nice enough to put some effort behind 
trying to support helm based pseudo atoms.  If we can figure out how to add 
arbitrary atoms instead of the fixed list, this could potentially solve a lot 
of disparate problems, not the least of which are the depictions.  Nice work!


Brian Kelley

> On Mar 1, 2016, at 7:38 AM, Greg Landrum <greg.land...@gmail.com> wrote:
> 
> Thanks for sharing that!
> 
> -greg
> 
> 
>> On Mon, Feb 29, 2016 at 9:21 AM, Esben Jannik Bjerrum 
>> <esbenjan...@rocketmail.com> wrote:
>> Hi All,
>>I put together a blog post about my experiences adding pseudo atom 
>> support to RDkit to handle peptides and proteins. Thought I would share it 
>> with you :-)
>> 
>> http://www.wildcardconsulting.dk/useful-information/learn-how-to-hack-rdkit-to-handle-peptides-with-pseudo-atoms/
>>  
>> Best Regards
>> 
>> Esben Jannik Bjerrum
>> cand.pharm, Ph.D
>> 
>> /Sent from my Ubuntu Touch Phone
>> 
>> Phone +45 2823 8009
>> http://dk.linkedin.com/in/esbenbjerrum
>> http://www.wildcardconsulting.dk
>> 
>> 
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Atoms with strange positions/bonds when drawer makes PNG

2016-02-23 Thread Brian Kelley
I have a fix for this in the conda recipes I can submit, it boils down to
something like the following ( setting the CXXFLAGS for c++11 )

if otool -L "$PYROOT/lib/libboost_python.dylib"  | grep libc++ ; then
FLAGS="-std=c++11 -stdlib=libc++"
else
FLAGS="-stdlib=libstdc++"
fi

CXXFLAGS=$FLAGS \
cmake -DPYTHON_EXECUTABLE=$PYROOT/bin/python  \
 ...

On Tue, Feb 23, 2016 at 10:52 AM, Greg Landrum 
wrote:

>
>
> On Tue, Feb 23, 2016 at 3:50 PM, Paul Emsley 
> wrote:
>
>>
>> This seems to be the relevant difference in the cmake output:
>>
>>
> 
>
>
>>
>> [  3%] Linking CXX shared library ../../lib/libRDBoost.dylib
>> Undefined symbols for architecture x86_64:
>>   "boost::python::throw_error_already_set()", referenced from:
>>   throw_index_error(int) in Wrap.cpp.o
>>   throw_value_error(std::__1::basic_string> std::__1::char_traits, std::__1::allocator >) in Wrap.cpp.o
>>   translate_index_error(IndexErrorException const&) in Wrap.cpp.o
>>   translate_value_error(ValueErrorException const&) in Wrap.cpp.o
>>   throw_runtime_error(std::__1::basic_string> std::__1::char_traits, std::__1::allocator >) in Wrap.cpp.o
>>   translate_invariant_error(Invar::Invariant const&) in Wrap.cpp.o
>> ld: symbol(s) not found for architecture x86_64
>> clang: error: linker command failed with exit code 1 (use -v to see
>> invocation)
>> make[2]: *** [lib/libRDBoost.2016.03.1.dev1.dylib] Error 1
>> make[1]: *** [Code/RDBoost/CMakeFiles/RDBoost.dir/all] Error 2
>> make: *** [all] Error 2
>>
>> These functions are in libboost_python.dylib, but I don't know how to
>> tell cmake how to find them there when linking RDBoost.dylib.
>>
>
> Ah, very good. I think I know this one. The problem tends to be due to a
> mixture of libraries that were built against the new C++ libraries (the
> default with newer versions of clang) and those built using the older C++
> mode. You control this with the -stdlib flag to the C++ compiler. The two
> options are "-stdlib=libc++" (this is the newer library and is now the
> default) or "-stdlib=libstdc++" (this is the older one). The way you
> diagnose this is by looking at the output of otool -L.
>
> Here's a library built using the new one (the default now with homebrew):
>
> ~/rdk/RDKit_git/build_java % otool -L /usr/local/lib/libboost_regex.dylib
> /usr/local/lib/libboost_regex.dylib:
> /usr/local/opt/boost/lib/libboost_regex.dylib (compatibility version
> 0.0.0, current version 0.0.0)
> /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version
> 120.1.0)
> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version
> 1226.10.1)
>
> And here's one built using the older version:
>
> ~/rdk/RDKit_git/build_java % otool -L
> /usr/local/opt/boost_1_48/lib/libboost_regex.dylib
> /usr/local/opt/boost_1_48/lib/libboost_regex.dylib:
> libboost_regex.dylib (compatibility version 0.0.0, current version 0.0.0)
> /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version
> 104.1.0)
> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version
> 1226.10.1)
>
>
> You cannot mix these, but you can control which gets used by cmake as
> follows:
> CXX="/usr/bin/c++ -stdlib=libstdc++" cmake  arguments here> ..
> or:
> CXX="/usr/bin/c++ -stdlib=libc++" cmake  here> ..
>
>
>> p.s.  Sorry for the delay, now back from my travels.
>>
>
> No worries, it gave me a chance to figure out how to at least diagnose
> these problems.
>
>
>> p.p.s. Is this a list where top-posting is preferred? (I'm an in-line
>> quoter by preference)
>>
>
> I personally prefer in-line, but sometimes will use top-posting if I'm
> lazy or if I'm using gmail on a mobile device (where the in-line quoting
> doesn't always work as well as one might hope).
>
> -greg
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

Re: [Rdkit-discuss] Molecule losing properties

2016-01-21 Thread Brian Kelley
Joos,

  I'm glad you found the issue.  Perhaps GetMolFrags should retain or have an 
option to retain public properties such as sd data.


Brian Kelley

> On Jan 21, 2016, at 8:14 AM, Joos Kiener <joos.kie...@gmail.com> wrote:
> 
> Hi Brian,
> 
> thanks for your reply. I now figured out the issue. The SDF I load has a few 
> multi-component entries and I wanted to just look at the first component to 
> avoid any issues with such molecules.
> 
> hence I had following step:
> 
> mols = [Chem.GetMolFrags(x, asMols=True)[0] for x in mols]
> 
> And this then breaks property for all molecules that where multi-component 
> but not for the other ones.
> 
> I fixed it by reassigning properties. If anyone know a nicer way to do this 
> would also be good:
> 
> for idx in range(0,len(mols)):
> mol = mols[idx]
> fragments = Chem.GetMolFrags(mol, asMols=True)
> if len(fragments) > 1:
> first_frag = fragments[0]
> for prop in mol.GetPropNames():
> first_frag.SetProp(prop, mol.GetProp(prop))
>     mols[idx]=first_frag
> 
> 
> Best Regards,
> 
> Joos
> 
> 2016-01-21 13:26 GMT+01:00 Brian Kelley <fustiga...@gmail.com>:
>> Joos,
>> 
>>   In your second loop, could you "print repr(prop)"as opposed to "print 
>> prop"  It could be that the name actually has a space in it which the sd 
>> format supports and can drive one to distraction.
>> 
>> 
>> Brian Kelley
>> 
>>> On Jan 21, 2016, at 2:11 AM, Joos Kiener <joos.kie...@gmail.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have a strange issue. I'm trying to display pairs of molecules (the pair 
>>> has a certain similarity threshold) and show a property for both molecules. 
>>> This is in IPyhton Notebook.
>>> 
>>> The weird thing is the first molecule of the pair loses all properties:
>>> 
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]] # und = list of molecules loaded from sd-file
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
>>> ---
>>> KeyError  Traceback (most recent call last)
>>>  in ()
>>>   7 toShow.append(mol1)
>>>   8 toShow.append(mol2)
>>> > 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>>  10 lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>>  11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
>>> 
>>> KeyError: 'Activ'
>>> 
>>> 
>>> If I change the code (remove the label) and print all properties of mol1, 
>>> the are displayed correctly.
>>> 
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]]
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> for prop in mol1.GetPropNames():
>>> print prop + ": "  + mol1.GetProp(prop)
>>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2)
>>> 
>>> This shows all the properties of mol1 plus draws the grid. No error.
>>> 
>>> However directly accessing the property by name fails with key error:
>>> toShow=[]
>>> lbls=[]
>>> for idx in pairs:
>>> did=dindices[idx]
>>> mol1=und[did[0]]
>>> mol2=und[did[1]]
>>> toShow.append(mol1)
>>> toShow.append(mol2)
>>> print mol1.GetProp('Activ')
>>> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>>> Draw.MolsToGridImage(toShow,molsPerRow=2)
>>> ---
>>> KeyError  Traceback (most recent call last)
>>>  in ()
>>>   7 toShow.append(mol1)
>>>   8 toShow.append(mol2)
>>> > 9 print mol1.GetProp('Activ')
>>>  10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>>> 

Re: [Rdkit-discuss] Molecule losing properties

2016-01-21 Thread Brian Kelley
Joos,

  In your second loop, could you "print repr(prop)"as opposed to "print prop"  
It could be that the name actually has a space in it which the sd format 
supports and can drive one to distraction.


Brian Kelley

> On Jan 21, 2016, at 2:11 AM, Joos Kiener <joos.kie...@gmail.com> wrote:
> 
> Hi all,
> 
> I have a strange issue. I'm trying to display pairs of molecules (the pair 
> has a certain similarity threshold) and show a property for both molecules. 
> This is in IPyhton Notebook.
> 
> The weird thing is the first molecule of the pair loses all properties:
> 
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]] # und = list of molecules loaded from sd-file
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
> ---
> KeyError  Traceback (most recent call last)
>  in ()
>   7 toShow.append(mol1)
>   8 toShow.append(mol2)
> > 9 lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>  10 lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
>  11 Draw.MolsToGridImage(toShow,molsPerRow=2,legends=lbls)
> 
> KeyError: 'Activ'
> 
> 
> If I change the code (remove the label) and print all properties of mol1, the 
> are displayed correctly.
> 
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]]
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> for prop in mol1.GetPropNames():
> print prop + ": "  + mol1.GetProp(prop)
> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2)
> 
> This shows all the properties of mol1 plus draws the grid. No error.
> 
> However directly accessing the property by name fails with key error:
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]]
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> print mol1.GetProp('Activ')
> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2)
> ---
> KeyError  Traceback (most recent call last)
>  in ()
>   7 toShow.append(mol1)
>   8 toShow.append(mol2)
> > 9 print mol1.GetProp('Activ')
>  10 #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
>  11 #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> 
> KeyError: 'Activ'
> 
> This all works fine for mol2:
> 
> 
> toShow=[]
> lbls=[]
> for idx in pairs:
> did=dindices[idx]
> mol1=und[did[0]]
> mol2=und[did[1]]
> toShow.append(mol1)
> toShow.append(mol2)
> print mol2.GetProp('Activ')
> #lbls.append('Active: %.2f'%mol1.GetProp('Activ'))
> #lbls.append('Active: %.2f'%mol2.GetProp('Activ'))
> Draw.MolsToGridImage(toShow,molsPerRow=2)
> 2.5 
> 7.7 
> 10.93 
> 2.0434 
> 190.0 
> 25.0 
> ...
> What is going on here??? How can I resolve this?
> Best Regards,
> 
> Joos
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure subtraction in RDKit

2016-01-21 Thread Brian Kelley
Without a concrete example, this solution may not be appropriate, but I
believe the function you want is "ReplaceCore".

ReplaceCore(...)

ReplaceCore( (Mol)mol, (Mol)coreQuery [, (bool)replaceDummies=True [,
(bool)labelByIndex=False [, (bool)requireDummyMatch=False]]]) -> Mol :

Removes the core of a molecule and labels the sidechains with dummy
atoms.


I just have python available currently so this may not be appropriate, but
here goes:

>>> m1 = Chem.MolFromSmiles("Cc1c1N")

>>> m2 = Chem.MolFromSmiles("c1c1")

>>> mcs = MCS.FindMCS([m1, m2])

>>> frag = Chem.ReplaceCore(m1, Chem.MolFromSmarts(mcs.smarts))

>>> print "SideChains:", Chem.MolToSmiles(frag)

SideChains: [*]C.[*]N

I hope this helps (at least the steps).

Now if you are just trying to extract side chains from the results of
reactions, we have recently added helper functions to solve that (They
should be exposed in the next release).


ReduceProductToSideChains(...)

ReduceProductToSideChains( (Mol)product [, (bool)addDummyAtoms=True])
-> Mol :

reduce the product of a reaction to the side chains added by the
reaction.

 The output is a molecule with attached wildcards indicating where the
product was attached.  The isotope of the dummy atom is the reaction map
number of the product's atom (if available).

If this would be useful, let us know, I would be happy to have a tester
prior to release.

Brian Kelley

On Thu, Jan 21, 2016 at 9:41 AM, James Wallace <chp11...@sheffield.ac.uk>
wrote:

> Hi,
> I'm using the KNIME implementation to write my own nodes, and I'm
> running into an issue. For the process I'm trying to do I'm trying to
> subtract the MCS between two molecules away from the larger molecule, to
> leave a list of fragments. I'm aware of the substructure matching, but
> I'm not sure how to subtract the matching atoms from a molecule graph
> within RDKit. As I say, I'm working with the Java version, but any
> pointers towards the fucntions needed would be useful. At the moment
> I've got (in pseudo code)
>
>  RWMol mol1a = RWMol.MolFromSmiles(reactant_string, 0, true);
>  RWMol mol2a = RWMol.MolFromSmiles(product_string, 0, true);
>
>  frag_bonds = mol2a.GetSubstructMatches(mol1a);
>
> But I'm unsure as to what to do with the array of matches to achieve
> what I want. Can I strip out the dummy atoms automatically, or is this
> something that is best achieved by processing the SMILES string?
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss