Re: [Rdkit-discuss] Download the file of zdd.sdf.gz

2019-11-06 Thread Greg Landrum
Hi,

That's a really old example and I'm no longer even sure what was in that
file.
Fortunately you can try the example with any sdf file or sdf.gz file that
you have, you don't need that exact file.

-greg




On Wed, Nov 6, 2019 at 10:49 PM Peng Yu  wrote:

> Hi,
>
> https://www.rdkit.org/docs/Cookbook.html#clustering-molecules
>
> I'd like to try the above example. Where can the file zdd.sdf.gz be
> downloaded? Thanks.
>
> --
> Regards,
> Peng
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecules not rendere in Dataframe

2019-11-06 Thread Greg Landrum
Yes, I can confirm that this is a problem caused by changes in Pandas
v0.25.x

Now we just need to figure out what those changes are and how to work
around them.

Here's a github issue to track the problem:
https://github.com/rdkit/rdkit/issues/2673

-greg


On Tue, Nov 5, 2019 at 12:21 PM Jan Halborg Jensen 
wrote:

> Hi again,
>
> Since I thought it might be a Colab problem I also posted the question on
> Stackoverflow, and got an answer from Oliver Scott
>
> https://stackoverflow.com/questions/58656572/problem-using-addmoleculecolumntoframe-on-google-colab/58690736#58690736
>
> "Seems like this is a problem with all pandas versions above 0.25.0, So I
> guess for now the easiest fix is to downgrade pandas. Or you can use this
> method which seemed to work for me:
>
> from IPython.display import HTML
> HTML(df.to_html())
>
> I haven’t managed to downgrade pandas on Colab (almost certainly a Colab
> issue) but the other workaround works fine.
>
> Best regards, Jan
>
> On 4 Nov 2019, at 19.47, Markus Heller  wrote:
>
> Hi,
>
> In a Jupyter notebook, the following code does not show renderings of the
> molecules in a Pandas dataframe:
>
> 
> from rdkit import Chem
> from rdkit.Chem import PandasTools
> from rdkit.Chem.Draw import MolsToGridImage
> from rdkit.Chem.Draw import IPythonConsole
> from rdkit.Chem import rdDepictor
>
> rdDepictor.SetPreferCoordGen(True)
> IPythonConsole.ipython_useSVG = True
>
> test_df = pd.read_csv(‘test.smi’, delim_whitespace=True, header=None,
> names=[‘smiles’, ‘id’])
>
> PandasTools.RenderImagesInAllDataFrames(images=True)
>
> PandasTools.AddMoleculeColumnToFrame(test_df, ‘smiles’, ‘mol’,
> includeFingerprints=False)
>
> test_df
> 
>
> Instead, string representations are shown (I think), i.e. every field in
> the mol column starts with
>
> 
> As far as I understand the documentation,
> PandasTools.RenderImagesInAllDataFrames(images=True) should show the
> rendered molecules.  What am I doing wrong?
>
> I’m using RDkit version 2019.03.4.0 via Anaconda.
>
> Thanks
> Markus
>
> --
> *Markus Heller, PhD*
> Senior Scientist
> Direct: 604.827.1122   Main: 604.827.1147
>
>  
> 2405 Wesbrook Mall, 4th Floor, Vancouver, BC V6T 1Z3
>
> This email and any attachments thereto may contain confidential material
> for the sole use of the intended recipient. Anyreview, copying, or
> distribution of this email (or any attachments thereto) by others is
> strictly prohibited. If you are not theintended recipient, please contact
> the sender immediately and permanently delete the original and any copies
> of this emailand any attachments thereto.
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Download the file of zdd.sdf.gz

2019-11-06 Thread Peng Yu
Hi,

https://www.rdkit.org/docs/Cookbook.html#clustering-molecules

I'd like to try the above example. Where can the file zdd.sdf.gz be
downloaded? Thanks.

-- 
Regards,
Peng


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] What is the difference between rdkit and openbabel?

2019-11-06 Thread Peng Yu
Hi,

It seems that rdkit and openbabel share some common functionality.
Could anybody help me understand the difference between them? What is
the best use of each tool? Thanks.

-- 
Regards,
Peng


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Explicit H in substructure searches

2019-11-06 Thread Markus Heller
Greg,

You hit the nail on the head!  I did attempt to use it for a substructure 
search with the intention of restricting valence states, but I screwed up and 
forgot the Chem.MergeQueryHs() command!

I did end up taking a different approach as described in an older RDkit blog 
post [1] using dummy atoms, since this seems to be an easier way to define 
substitutions on carbon atoms.

Either way, I’ve just started to work on my Python skills and have to say that 
I’m beginning to appreciate what a powerful tool RDkit is, so thanks for that!!

Markus

[1] http://rdkit.blogspot.com/2016/07/tuning-substructure-queries-ii.html

From: Greg Landrum 
Sent: Tuesday, November 5, 2019 11:26 PM
To: Markus Heller 
Cc: rdkit-discuss (rdkit-discuss@lists.sourceforge.net) 

Subject: Re: [Rdkit-discuss] Explicit H in substructure searches

Paolo's answer was completely correct, but there's an additional point that's 
worth mentioning here.
Hs are often included in query molecules with the intent of restricting 
possible valence states of atoms, not because the user is actually interested 
in matching Hs. In this case you can use the function Chem.MergeQueryHs() to 
remove the H atoms in your query molecule and add/adjust H count queries on the 
heavy atoms they are connected to.

Here's how that works in your example:
In [6]: params = Chem.SmilesParserParams()
   ...: params.removeHs=False
   ...: query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)

In [7]: m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
   ...: m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
   ...: m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')
In [8]: m1.HasSubstructMatch(query)
Out[8]: False

In [15]: q2 = Chem.MergeQueryHs(query)

In [16]: m1.HasSubstructMatch(q2)
Out[16]: True

In [17]: m2.HasSubstructMatch(q2)
Out[17]: False

In [18]: m3.HasSubstructMatch(q2)
Out[18]: True

You can see what has happened by calling MolToSmarts:
In [19]: Chem.MolToSmarts(q2)
Out[19]: '[#6]1:[#6]:[#7]:[#7H]:[#6]:1-[#7&!H0&!H1]'

Notice that the N atom now has query features attached to it.

I hope this helps,
-greg


On Tue, Nov 5, 2019 at 7:53 PM Markus Heller 
mailto:mhel...@admarebio.com>> wrote:
Hi,

I’m trying to understand how to properly use explicit hydrogens in substructure 
searches.  Below is an example.  I would like to find all molecules that 
contain my query with hydrogens at the nitrogens, and I thought I was on the 
right track …  Why does the first query with the explicit H not match m1?

Thanks
Markus


from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor

rdDepictor.SetPreferCoordGen(True)
IPythonConsole.ipython_useSVG = True

m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')

# do not remove explicit H
params = Chem.SmilesParserParams()
params.removeHs=False

query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)

# first should be True, but all are False
m1.HasSubstructMatch(query)
m2.HasSubstructMatch(query)
m3.HasSubstructMatch(query)

# rebuild query with explicit H removed, not what I want
query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1')

m1.HasSubstructMatch(query)
m2.HasSubstructMatch(query)
m3.HasSubstructMatch(query)


--
Markus Heller, PhD
Senior Scientist
Direct: 604.827.1122   Main: 604.827.1147

 [A027228F]
2405 Wesbrook Mall, 4th Floor, Vancouver, BC V6T 1Z3

This email and any attachments thereto may contain confidential material for 
the sole use of the intended recipient. Any review, copying, or distribution of 
this email (or any attachments thereto) by others is strictly prohibited. If 
you are not the intended recipient, please contact the sender immediately and 
permanently delete the original and any copies of this email and any 
attachments thereto.

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Explicit H in substructure searches

2019-11-06 Thread Markus Heller
Hi Paolo,

Thank you very much for this!  This clarified issues I hadn’t even thought 
about yet 

Cheers
Markus

From: Paolo Tosco 
Sent: Tuesday, November 5, 2019 2:49 PM
To: Markus Heller ; rdkit-discuss 
(rdkit-discuss@lists.sourceforge.net) 
Subject: Re: [Rdkit-discuss] Explicit H in substructure searches


Hi Markus,

I tried to put together a comprehensible explanation in his gist:

https://gist.github.com/ptosco/1088937ce332bd66c999a2a5fbc855b3

Please also refer to the following threads on the mailing list:

https://sourceforge.net/p/rdkit/mailman/message/29679834/
https://sourceforge.net/p/rdkit/mailman/message/36696340/

and to this blog post by Roger Sayle:

https://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/

for further clarifications.

Cheers,
p.
On 05/11/2019 19:52, Markus Heller wrote:
Hi,

I’m trying to understand how to properly use explicit hydrogens in substructure 
searches.  Below is an example.  I would like to find all molecules that 
contain my query with hydrogens at the nitrogens, and I thought I was on the 
right track …  Why does the first query with the explicit H not match m1?

Thanks
Markus


from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor

rdDepictor.SetPreferCoordGen(True)
IPythonConsole.ipython_useSVG = True

m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')

# do not remove explicit H
params = Chem.SmilesParserParams()
params.removeHs=False

query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)

# first should be True, but all are False
m1.HasSubstructMatch(query)
m2.HasSubstructMatch(query)
m3.HasSubstructMatch(query)

# rebuild query with explicit H removed, not what I want
query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1')

m1.HasSubstructMatch(query)
m2.HasSubstructMatch(query)
m3.HasSubstructMatch(query)


--
Markus Heller, PhD
Senior Scientist
Direct: 604.827.1122   Main: 604.827.1147

 [A027228F]
2405 Wesbrook Mall, 4th Floor, Vancouver, BC V6T 1Z3

This email and any attachments thereto may contain confidential material for 
the sole use of the intended recipient. Any review, copying, or distribution of 
this email (or any attachments thereto) by others is strictly prohibited. If 
you are not the intended recipient, please contact the sender immediately and 
permanently delete the original and any copies of this email and any 
attachments thereto.





___

Rdkit-discuss mailing list

Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit.ML.Scoring?

2019-11-06 Thread Markus Metz
Awesome, thanks very much!
Markus

On Wed, Nov 6, 2019 at 7:49 AM Greg Landrum  wrote:

> Hi Markus,
>
> try: from rdkit.ML.Scoring import Scoring
>
> We could make this a bit easier to discover...
>
> -greg
>
>
> On Wed, Nov 6, 2019 at 4:18 PM Markus Metz  wrote:
>
>> Hello all:
>> I just came across the description of this package on rdkit.org.
>> However, it seems there is not much to import.
>> When I type
>> from rdkit.ML.Scoring import 'tab'  in a jupyter notebook I am getting
>> Scoring
>> __doc__
>> __file__
>> __name__
>> __package__
>> Can you please clarify if and maybe when this function is available?
>> I am using rdkit.2019.09.1.
>> Many thanks in advance,
>> Markus
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit.ML.Scoring?

2019-11-06 Thread Greg Landrum
Hi Markus,

try: from rdkit.ML.Scoring import Scoring

We could make this a bit easier to discover...

-greg


On Wed, Nov 6, 2019 at 4:18 PM Markus Metz  wrote:

> Hello all:
> I just came across the description of this package on rdkit.org.
> However, it seems there is not much to import.
> When I type
> from rdkit.ML.Scoring import 'tab'  in a jupyter notebook I am getting
> Scoring
> __doc__
> __file__
> __name__
> __package__
> Can you please clarify if and maybe when this function is available?
> I am using rdkit.2019.09.1.
> Many thanks in advance,
> Markus
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-06 Thread Ivan Tubert-Brohman
Hi,

For reasons to complicated to get into here, I ended up with a molecule
containing a =CH2 in which one of the hydrogens was explicit and had E/Z
stereo info. For example, consider [H]/C=C/F.

I was surprised that RemoveHs() refused to remove the hydrogen, although
later I found that that's the documented behavior, and generally it makes
sense as a way to prevent the loss of stereochemical information.

For example, compare these two:

In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F')))
Out[7]: '[H]/C=C/F'

In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F')))
Out[8]: 'C=CF'

A chemist would say that these two are obviously the same molecule, and
arguably the second representation is better, because a double bond ending
in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect
RDKit to make that kind of inference, but still I wonder, what would be a
good automated way to get from [H]/C=C/F to C=CF?

One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying
this reaction:

[H][C:1]=[*:2]>>[CH2:1]=[*:2]

but perhaps there's a better way?

Best,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit.ML.Scoring?

2019-11-06 Thread Markus Metz
Hello all:
I just came across the description of this package on rdkit.org.
However, it seems there is not much to import.
When I type
from rdkit.ML.Scoring import 'tab'  in a jupyter notebook I am getting
Scoring
__doc__
__file__
__name__
__package__
Can you please clarify if and maybe when this function is available?
I am using rdkit.2019.09.1.
Many thanks in advance,
Markus
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Wiener Index Calculation

2019-11-06 Thread Greg Landrum
Hi Goutam,

The RDKit doesn't have an implementation of the Wiener index[1]

I don't seem to have copies of all those old papers anymore, but assuming
that the wikipedia page about the Wiener index has the definition correct (
https://en.wikipedia.org/wiki/Wiener_index), here's a crude implementation
that should give you an idea of how to do the calculation:

In [8]: def wiener_index(m):
   ...: res = 0
   ...: amat = Chem.GetDistanceMatrix(m)
   ...: for i in range(m.GetNumAtoms()):
   ...: for j in range(i+1,m.GetNumAtoms()):
   ...: res += amat[i][j]
   ...: return res
   ...:


In [18]: butane = Chem.MolFromSmiles('')



In [19]: wiener_index(butane)


Out[19]: 10.0

In [20]: isobutane = Chem.MolFromSmiles('CC(C)C')



In [21]: wiener_index(isobutane)


Out[21]: 9.0

I hope this helps
-greg
[1] which is kind of odd, because I think I remember implementing it years
and years ago, but it doesn't seem to be in the code now.


On Wed, Nov 6, 2019 at 3:17 PM Goutam Mukherjee  wrote:

> Dear Members,
>
> I want to calculate Wiener Index for my molecule.
> Could you help me how do I calculate the same using rdKit command.
>
> Thanks and Best Regards,
> Goutam
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Wiener Index Calculation

2019-11-06 Thread Goutam Mukherjee
Dear Members,

I want to calculate Wiener Index for my molecule.
Could you help me how do I calculate the same using rdKit command.

Thanks and Best Regards,
Goutam
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Somthing wrong with MolDraw2DSVG

2019-11-06 Thread Paolo Tosco

Hi Zhang,

this looks like a bug triggered by molecules whose Y size is very small, 
such as all molecules which are constituted by a single, horizontal bond:


https://github.com/rdkit/rdkit/issues/2762

Cheers,
p.


On 11/04/19 07:14, Shengde wrote:

Hi,

I try to draw molecules in a grid and use the  following code.
Usually it works good. Howerver, when I try to draw some molecules
with single atom like "Cl", I got a *blank figure* with nothing on
it. As long as I add a more complex molecule to the smi_list like
["*Cl","*CC"], I got what I want again. Why can't I draw only
single atom molecules in a figure ? My rdkit version is *2019.03.2.*


from rdkit import Chem
from rdkit.Chem import Draw
import math
from IPython.display import SVG

smi_list = ["*Cl"]
mols = [Chem.MolFromSmiles(smi) for smi in smi_list]
sub_size = [250,250]
mols_num = len(mols)
columns_num = 5
rows_num = math.ceil(mols_num/5)
grid = [columns_num,rows_num]
d = 
Draw.rdMolDraw2D.MolDraw2DSVG(grid[0]*sub_size[0],grid[1]*sub_size[1], 
sub_size[0],sub_size[1])

opt = d.drawOptions()
opt.legendFontSize=20
d.SetFontSize(1.3*d.FontSize())
d.SetLineWidth(1)
d.DrawMolecules(mols,
                highlightAtoms=None,
                highlightBonds=None,
                highlightAtomColors=None,
                highlightBondColors=None,
                legends=None)
d.FinishDrawing()
SVG(d.GetDrawingText())


Thank you for your help!

Best regards,
Shengde, Zhang




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss