Re: [Rdkit-discuss] SDwriter

2016-12-16 Thread Milinda Samaraweera
This SD file is then used as an input for another program, that program is
having problems reading the sequence numbers.

Thanks,
MAK

On Fri, Dec 16, 2016 at 10:43 PM, Greg Landrum 
wrote:

> It's easy enough to make this an option, but given that it is part of the
> SDF spec (as Andrew has pointed out) the only reason I can think of to do
> so would be because it causes problems for some other piece of (likely
> commonly used) software.
>
> Are the sequence numbers causing a problem for you?
>
> -greg
>
>
>
>
>
>
> On Sat, Dec 17, 2016 at 1:46 AM +0100, "Milinda Samaraweera" <
> milindaatw...@gmail.com> wrote:
>
> Dear Users,
>>
>> I was using the SDWriter in the rdkit kit to generate a SD file with
>> mutiple entries generated using smiles and later assign SD tag data (e.g.
>> pubchem_ID, IUPAC_name, etc).
>>
>> However at the end of each tag header I noticed there is a number
>> (bolded):
>>
>> ...
>> >   * (1) *
>> N1-(2-ethylbutyl)hexane-1,3,6-triamine
>>
>> >*(1) *
>> 118903148
>>
>> ...
>> M  END
>> >   * (2)*
>> N1,N2-dimethyl-N2-[3-(methylamino)propyl]-N1-propylpropane-1,2-diamine
>>
>> >   * (2) *
>> 118883401
>>
>> What is this number and how you avoid printing this number when SDwriter
>> is used? As this number is not found in standard SD files.
>>
>> Thanks,
>> CodeMAK
>>
>>


-- 
Milinda Samaraweera, Ph.D.
Postdoctoral Fellow, Department of Pharmacy
University of Connecticut
69 North Eagleville road
Storrs, CT, 06269
milindaatw...@gmail.com
860-617-8594
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDwriter

2016-12-16 Thread Greg Landrum
It's easy enough to make this an option, but given that it is part of the SDF 
spec (as Andrew has pointed out) the only reason I can think of to do so would 
be because it causes problems for some other piece of (likely commonly used) 
software.
Are the sequence numbers causing a problem for you?
-greg






On Sat, Dec 17, 2016 at 1:46 AM +0100, "Milinda Samaraweera" 
 wrote:










Dear Users,

I was using the SDWriter in the rdkit kit to generate a SD file with mutiple 
entries generated using smiles and later assign SD tag data (e.g. pubchem_ID, 
IUPAC_name, etc).

However at the end of each tag header I noticed there is a number (bolded):

...
>    (1) 
N1-(2-ethylbutyl)hexane-1,3,6-triamine

>    (1) 
118903148

...
M  END
>    (2) 
N1,N2-dimethyl-N2-[3-(methylamino)propyl]-N1-propylpropane-1,2-diamine

>    (2) 
118883401

What is this number and how you avoid printing this number when SDwriter is 
used? As this number is not found in standard SD files.

Thanks,
CodeMAK







--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SetAtomAlias

2016-12-16 Thread Peter Gedeck
Hello,

SetMolAlias is available in Python as a function and not as an Atom method:

from rdkit import Chem
import sys
m = Chem.MolFromSmiles('CCC')
for i, atom in enumerate(m.GetAtoms()):
  Chem.SetAtomAlias(atom, 'C' + str(i + 1))
 w = Chem.SDWriter(sys.stdout)
 w.write(m)
 w.close()

Best,

Peter


On Fri, Dec 16, 2016 at 5:31 PM Paolo Tosco  wrote:

> Dear Jean-Marc,
>
> here:
>
>
> https://gist.github.com/ptosco/6e4468350f0fff183e4507ef24f092a1#file-pdb_atom_names-ipynb
>
>
> there's an example how to use the atom aliases in RDKit.
>
> Cheers,
> p.
>
>
> On 12/16/2016 10:26 PM, Jean-Marc Nuzillard wrote:
> > Hi all,
> >
> > I try add labels to atoms in a molecule, so that lines like
> >
> > A1
> > C12
> > A2
> > C3
> >
> > are written when the molecule is written in a SD file.
> >
> > Considering atom a and alias text txt,
> > I expected the function call SetAtomAlias(a, txt) to do the job.
> > I found this function in a documentation page about the rdchem module.
> > So, my script started with
> >
> > from rdkit import Chem
> > from rdkit.Chem import rdchem
> >
> > I got:
> >
> > NameError: name 'SetAtomAlias' is not defined.
> >
> > I guess the solution is trivial.
> > Forgive my ignorance.
> >
> > All the best,
> >
> > Jean-Marc
> >
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SetAtomAlias

2016-12-16 Thread Paolo Tosco
Dear Jean-Marc,

here:

https://gist.github.com/ptosco/6e4468350f0fff183e4507ef24f092a1#file-pdb_atom_names-ipynb
 


there's an example how to use the atom aliases in RDKit.

Cheers,
p.


On 12/16/2016 10:26 PM, Jean-Marc Nuzillard wrote:
> Hi all,
>
> I try add labels to atoms in a molecule, so that lines like
>
> A1
> C12
> A2
> C3
>
> are written when the molecule is written in a SD file.
>
> Considering atom a and alias text txt,
> I expected the function call SetAtomAlias(a, txt) to do the job.
> I found this function in a documentation page about the rdchem module.
> So, my script started with
>
> from rdkit import Chem
> from rdkit.Chem import rdchem
>
> I got:
>
> NameError: name 'SetAtomAlias' is not defined.
>
> I guess the solution is trivial.
> Forgive my ignorance.
>
> All the best,
>
> Jean-Marc
>


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SetAtomAlias

2016-12-16 Thread Jean-Marc Nuzillard
Hi all,

I try add labels to atoms in a molecule, so that lines like

A1
C12
A2
C3

are written when the molecule is written in a SD file.

Considering atom a and alias text txt,
I expected the function call SetAtomAlias(a, txt) to do the job.
I found this function in a documentation page about the rdchem module.
So, my script started with

from rdkit import Chem
from rdkit.Chem import rdchem

I got:

NameError: name 'SetAtomAlias' is not defined.

I guess the solution is trivial.
Forgive my ignorance.

All the best,

Jean-Marc

-- 

Jean-Marc Nuzillard
Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/



--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Canonicalisation with reaction labels

2016-12-16 Thread Syed Asad Rahman
Interesting question. Not sure if it's relevant but in ECBlast we do provide 
Canonicalised reaction labels. I agree with Greg that AAM is important.

https://github.com/asad/ReactionDecoder

http://www.ebi.ac.uk/thornton-srv/software/rbl/

Regards,
Asad

Sent from my iPhone

> On 16 Dec 2016, at 14:42, Stephen Pickett  wrote:
> 
> Thanks Greg, that’s clear.
>  
> Stephen
>  
> From: Greg Landrum [mailto:greg.land...@gmail.com] 
> Sent: 16 December 2016 14:33
> To: Stephen Pickett
> Cc: rdkit-discuss@lists.sourceforge.net
> Subject: Re: [Rdkit-discuss] Canonicalisation with reaction labels
>  
> EXTERNAL
> 
> Hi Stephen,
>  
> The new canonicalization algorithm intentionally takes the atom-mapping 
> information into account. The logic is that the entire SMILES provided should 
> be canonical, so if the SMILES includes atom maps, those atom maps should be 
> considered while canonicalizing.
>  
> If you have a molecule with atom maps and you would like the canonical SMILES 
> without the maps, you can do this (with the most recent version of the code):
>  
> In [18]: mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
>  
> In [19]: nmol = Chem.Mol(mol)
>  
> In [20]: for at in nmol.GetAtoms(): at.SetAtomMapNum(0)
>  
> In [21]: Chem.MolToSmiles(mol,True)
> Out[21]: 'C1CC([*:1])CCN1'
>  
> In [22]: Chem.MolToSmiles(nmol,True)
> Out[22]: '[*]C1CCNCC1'
>  
> A somewhat less clear (IMO) way of doing this that works in all versions is:
>  
> In [27]: nmol = Chem.Mol(mol)
>  
> In [28]: for at in nmol.GetAtoms(): at.ClearProp('molAtomMapNumber')
>  
> In [29]: Chem.MolToSmiles(nmol,True)
> Out[29]: '[*]C1CCNCC1'
>  
>  
> I hope this helps,
> -greg
>  
>  
>  
> On Fri, Dec 16, 2016 at 1:55 PM, Stephen Pickett  
> wrote:
> Hi
>  
> With a 2013 RDkit install we get consistent canonicalization between reaction 
> labelled and unlabelled atoms.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*:1]C1CCNCC1'
>  
> In 2015-09 we are seeing differences.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
> >>> Chem.MolToSmiles(mol)
> 'C1CC([*:1])CCN1'
>  
> I can understand why canonicalization can be different between versions but 
> I’m not sure whether this change in behaviour is expected?
> I’m afraid that I don’t have ready access to a more recent install to test 
> this out.
>  
> Thanks
>  
> Stephen
>  
> 
> This e-mail was sent by GlaxoSmithKline Services Unlimited
> (registered in England and Wales No. 1047315), which is a
> member of the GlaxoSmithKline group of companies. The
> registered address of GlaxoSmithKline Services Unlimited
> is 980 Great West Road, Brentford, Middlesex TW8 9GS.
> GSK monitors email communications sent to and from GSK in order to protect 
> GSK, our employees, customers, suppliers and business partners, from cyber 
> threats and loss of GSK Information. GSK monitoring is conducted with 
> appropriate confidentiality controls and in accordance with local laws and 
> after appropriate consultation.
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
>  
> 
> 
> This e-mail was sent by GlaxoSmithKline Services Unlimited
> (registered in England and Wales No. 1047315), which is a
> member of the GlaxoSmithKline group of companies. The
> registered address of GlaxoSmithKline Services Unlimited
> is 980 Great West Road, Brentford, Middlesex TW8 9GS.
> GSK monitors email communications sent to and from GSK in order to protect 
> GSK, our employees, customers, suppliers and business partners, from cyber 
> threats and loss of GSK Information. GSK monitoring is conducted with 
> appropriate confidentiality controls and in accordance with local laws and 
> after appropriate consultation.
> 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

Re: [Rdkit-discuss] Canonicalisation with reaction labels

2016-12-16 Thread Greg Landrum
Hi Stephen,

The new canonicalization algorithm intentionally takes the atom-mapping
information into account. The logic is that the entire SMILES provided
should be canonical, so if the SMILES includes atom maps, those atom maps
should be considered while canonicalizing.

If you have a molecule with atom maps and you would like the canonical
SMILES without the maps, you can do this (with the most recent version of
the code):

In [18]: mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')

In [19]: nmol = Chem.Mol(mol)

In [20]: for at in nmol.GetAtoms(): at.SetAtomMapNum(0)

In [21]: Chem.MolToSmiles(mol,True)
Out[21]: 'C1CC([*:1])CCN1'

In [22]: Chem.MolToSmiles(nmol,True)
Out[22]: '[*]C1CCNCC1'


A somewhat less clear (IMO) way of doing this that works in all versions is:

In [27]: nmol = Chem.Mol(mol)

In [28]: for at in nmol.GetAtoms(): at.ClearProp('molAtomMapNumber')

In [29]: Chem.MolToSmiles(nmol,True)
Out[29]: '[*]C1CCNCC1'



I hope this helps,
-greg



On Fri, Dec 16, 2016 at 1:55 PM, Stephen Pickett 
wrote:

> Hi
>
>
>
> With a 2013 RDkit install we get consistent canonicalization between
> reaction labelled and unlabelled atoms.
>
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
>
> >>> Chem.MolToSmiles(mol)
>
> '[*]C1CCNCC1'
>
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
>
> >>> Chem.MolToSmiles(mol)
>
> '[*:1]C1CCNCC1'
>
>
>
> In 2015-09 we are seeing differences.
>
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
>
> >>> Chem.MolToSmiles(mol)
>
> '[*]C1CCNCC1'
>
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
>
> >>> Chem.MolToSmiles(mol)
>
> 'C1CC([*:1])CCN1'
>
>
>
> I can understand why canonicalization can be different between versions
> but I’m not sure whether this change in behaviour is expected?
>
> I’m afraid that I don’t have ready access to a more recent install to test
> this out.
>
>
>
> Thanks
>
>
>
> *Stephen*
>
> --
>
> This e-mail was sent by GlaxoSmithKline Services Unlimited
> (registered in England and Wales No. 1047315), which is a
> member of the GlaxoSmithKline group of companies. The
> registered address of GlaxoSmithKline Services Unlimited
> is 980 Great West Road, Brentford, Middlesex TW8 9GS.
>
> *GSK monitors email communications sent to and from GSK in order to
> protect GSK, our employees, customers, suppliers and business partners,
> from cyber threats and loss of GSK Information. GSK monitoring is conducted
> with appropriate confidentiality controls and in accordance with local laws
> and after appropriate consultation.*
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Canonicalisation with reaction labels

2016-12-16 Thread Andrew Dalke
On Dec 16, 2016, at 1:55 PM, Stephen Pickett wrote:
> With a 2013 RDkit install we get consistent canonicalization between reaction 
> labelled and unlabelled atoms.
> >>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*]C1CCNCC1'
> >>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
> >>> Chem.MolToSmiles(mol)
> '[*:1]C1CCNCC1'

2013 RDKit didn't preserve the atom order between labeled and unlabeled atoms.

It looked like it for many cases, but there were a few cases where the slight 
change to the initial atom invariants, caused by the atom label, ended up 
affecting the SMILES.

I no longer have an older version of RDKit installed. Going through my notes, 
here was one of the failure cases:

core =>   
Cc1cc2c3c(c1)C[N@]([*])CCN(C)CC[N@@]([*])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2
 syntax=> 
Cc1cc2c3c(c1)C[N@]([*:1])CCN(C)CC[N@@]([*:2])Cc1cc(C)cc(c1OCCCO3)C[N@](C)CCN(C)CC[N@@](C)C2
 canonical => 
Cc1cc2c3c(c1)C[N@]([*:2])CCN(C)CC[N@@]([*:1])Cc1cc(C)cc(c1OCCCO3)C[N@@](C)CCN(C)CC[N@](C)C2

For my project I ended up canonicalizing with unlabeled atoms, using the 
_smilesAtomOutputOrder to figure out where the "*" atoms were located in the 
SMILES string, use CanonicalRankAtoms() to figure out which were symmetrical, 
and come up with my own canonical labeling on top of the canonical unlabeled 
SMILES.


> I can understand why canonicalization can be different between versions but 
> I’m not sure whether this change in behaviour is expected?

While it is possible to generate a canonical labeling which preserves the same 
atom order as the canonical unlabeled SMILES (as I did above), that's more 
complicated. It's easier to include the label as part of the atom invariant and 
use the regular canonicalization mechanism.

Cheers,


Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Canonicalisation with reaction labels

2016-12-16 Thread Stephen Pickett
Hi

With a 2013 RDkit install we get consistent canonicalization between reaction 
labelled and unlabelled atoms.
>>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
>>> Chem.MolToSmiles(mol)
'[*]C1CCNCC1'
>>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
>>> Chem.MolToSmiles(mol)
'[*:1]C1CCNCC1'

In 2015-09 we are seeing differences.
>>> mol = Chem.MolFromSmiles('C1CC([*])CCN1')
>>> Chem.MolToSmiles(mol)
'[*]C1CCNCC1'
>>> mol = Chem.MolFromSmiles('C1CC([*:1])CCN1')
>>> Chem.MolToSmiles(mol)
'C1CC([*:1])CCN1'

I can understand why canonicalization can be different between versions but I'm 
not sure whether this change in behaviour is expected?
I'm afraid that I don't have ready access to a more recent install to test this 
out.

Thanks

Stephen



This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.

GSK monitors email communications sent to and from GSK in order to protect GSK, 
our employees, customers, suppliers and business partners, from cyber threats 
and loss of GSK Information. GSK monitoring is conducted with appropriate 
confidentiality controls and in accordance with local laws and after 
appropriate consultation.
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss