Re: [Rdkit-discuss] MolToSmiles gives explicit H after ReplaceSubstructs

2021-11-05 Thread Ling Chan
Dear Paolo,

Thank you for all the tips. I was not aware of these.

In fact I did suspect that that [H] has something to do with the
stereochemistry, since the original F made the bond stereo. However, the
"[H]" did not go away even after I called

AllChem.SanitizeMol(m6)
AllChem.AssignStereochemistry(m6)

or

AllChem.SanitizeMol(m6)
AllChem.FindPotentialStereo(m6)

I thought these would trigger the re-perception of the double bond, which
is no longer stereo.

By the way, I wrote down in my notes that
rdkit.Chem.rdmolops.AssignStereochemistry is old, while
rdkit.Chem.rdmolops.FindPotentialStereo is new. So it may be better to use
the latter.

As for DeleteSubstructs, in fact I started out using this but ran into some
problem. Then I switched to ReplaceSubstructs. I am still analyzing that
problem. I ran it on a big data set and hence I need to make sense of the
issue first. If I can boil it down, I may make a separate forum post.

Ling



Paolo Tosco  於 2021年11月5日週五 上午5:54寫道:

> Hi Ling,
>
> By default hydrogens defining double bond stereochemistry are not removed.
> You may remove that residual hydrogen by either
>
> params = Chem.RemoveHsParameters()
> params.removeDefiningBondStereo = True
> Chem.RemoveHs(m6, params)
>
> or simply
>
> Chem.RemoveAllHs(m6)
>
> I think you may obtain the same result by just
>
> m6s = AllChem.DeleteSubstructs(m5, mf)
>
> Cheers,
> p.
>
>
> On Wed, Nov 3, 2021 at 9:29 PM Ling Chan  wrote:
>
>> Hello colleagues,
>>
>> I tried to change all F's into H's. It worked. But when I converted the
>> result into a smiles string, there is the occasional lingering explicit
>> hydrogen. It is there even after I do a RemoveHs().
>>
>> Just wonder what is this explicit H about, since it may have implications
>> on any further processing.
>>
>> Thank you!
>>
>> Ling
>>
>>
>>
>> mh = Chem.MolFromSmiles("[#1]")
>> mf = Chem.MolFromSmarts('F')
>> m5 = Chem.MolFromSmiles("F/C=C1/[C@H](F)[C@@H](F)O[C@@H]1F")
>> m6s = AllChem.ReplaceSubstructs(m5,mf,mh,replaceAll=True)
>> m6 = m6s[0]
>> print(Chem.MolToSmiles(Chem.RemoveHs(m6)))
>>
>> [H]C=C1CCOC1
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles gives explicit H after ReplaceSubstructs

2021-11-05 Thread Paolo Tosco
Hi Ling,

By default hydrogens defining double bond stereochemistry are not removed.
You may remove that residual hydrogen by either

params = Chem.RemoveHsParameters()
params.removeDefiningBondStereo = True
Chem.RemoveHs(m6, params)

or simply

Chem.RemoveAllHs(m6)

I think you may obtain the same result by just

m6s = AllChem.DeleteSubstructs(m5, mf)

Cheers,
p.


On Wed, Nov 3, 2021 at 9:29 PM Ling Chan  wrote:

> Hello colleagues,
>
> I tried to change all F's into H's. It worked. But when I converted the
> result into a smiles string, there is the occasional lingering explicit
> hydrogen. It is there even after I do a RemoveHs().
>
> Just wonder what is this explicit H about, since it may have implications
> on any further processing.
>
> Thank you!
>
> Ling
>
>
>
> mh = Chem.MolFromSmiles("[#1]")
> mf = Chem.MolFromSmarts('F')
> m5 = Chem.MolFromSmiles("F/C=C1/[C@H](F)[C@@H](F)O[C@@H]1F")
> m6s = AllChem.ReplaceSubstructs(m5,mf,mh,replaceAll=True)
> m6 = m6s[0]
> print(Chem.MolToSmiles(Chem.RemoveHs(m6)))
>
> [H]C=C1CCOC1
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles atom ordering

2021-11-03 Thread Ling Chan
Cool, good to know this special property. Thank you Andrew!
Ling


Andrew Dalke  於 2021年11月2日週二 下午10:36寫道:

> Hi Ling,
>
>   If there are symmetries then a substructure search like will only give
> you one mapping, and that might not be the canonical mapping.
>
> What you're looking for is the special property _smilesAtomOutputOrder
>
>
> >>> from rdkit import Chem
> >>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C")
> >>> Chem.MolToSmiles(mol)
> 'COc1cc(CNC(=O)/C=C/C(C)C)ccc1O'
> >>> mol.GetProp("_smilesAtomOutputOrder")
> '[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]'
>
> Here are the atom indices of the original SMILES:
>
>  ┌ 1 11   1 1 1 2 2
> atoms│ 0 1 234 56 78 9 0 12  3456 7 8 9 0 1
>  └ | | ||| || || | | ||   | | | | |
>SMILES[ O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C
>
>
> You can see the first atom of the output is a "C", which is mapped to
> position 8 in the _smilesAtomOutputOrder, which is the "...C)..." in the
> original SMILES, etc.
>
>
> Cheers,
>
>
> Andrew
> da...@dalkescientific.com
>
>
> > On Nov 3, 2021, at 00:18, Ling Chan  wrote:
> >
> > O.K. Problem solved. Sorry about the spam, folks.
> >
> > I can use GetSubstructMatch, as follows.
> >
> > # sinput is the input smiles
> > # scanon is the output smiles
> >
> > minput = Chem.MolFromSmiles(sinput)
> > scanon=Chem.MolToSmiles(minput)
> > mcanon=Chem.MolFromSmiles(scanon)
> > map_forward = minput.GetSubstructMatch(mcanon)
> > map_backward = mcanon.GetSubstructMatch(minput)
> >
> >
> >
> >
> > Ling Chan  於 2021年11月2日週二 下午3:55寫道:
> > Dear colleagues,
> >
> > Just wonder if I can obtain a mapping of the atom indices upon
> canonicalization by MolToSmiles ? I am aware that canonicalization (and
> hence atom reordering) can be suppressed in MolToSmiles, but I do want to
> canonicalize the output smiles.
> >
> > If you are interested, here is a bit more details of my problem. For
> each molecule, I want to delete one or two side chains, and obtain a smiles
> of what is left. Just that I want to know what are the atoms that bonded to
> the deleted side chains. I know, by suppressing canonicalization things
> will work. But I would like to canonicalize the smiles so that I can know
> if there are duplicates.
> >
> > I tried marking the atoms. But I believe that properties that got
> carried over to the output smiles, e.g. Isotope, affect the
> canonicalization, while properties that do not affect canonicalization,
> e.g, IntProp, are lost upon the conversion to smiles.
> >
> > Thank you for your insight.
> >
> > Ling
> >
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles atom ordering

2021-11-02 Thread Andrew Dalke
Hi Ling,

  If there are symmetries then a substructure search like will only give you 
one mapping, and that might not be the canonical mapping.

What you're looking for is the special property _smilesAtomOutputOrder


>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C")
>>> Chem.MolToSmiles(mol)
'COc1cc(CNC(=O)/C=C/C(C)C)ccc1O'
>>> mol.GetProp("_smilesAtomOutputOrder")
'[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]'

Here are the atom indices of the original SMILES:

 ┌ 1 11   1 1 1 2 2
atoms│ 0 1 234 56 78 9 0 12  3456 7 8 9 0 1
 └ | | ||| || || | | ||   | | | | |
   SMILES[ O=C(NCc1cc(OC)c(O)cc1)/C=C/C(C)C


You can see the first atom of the output is a "C", which is mapped to position 
8 in the _smilesAtomOutputOrder, which is the "...C)..." in the original 
SMILES, etc.


Cheers,


Andrew
da...@dalkescientific.com


> On Nov 3, 2021, at 00:18, Ling Chan  wrote:
> 
> O.K. Problem solved. Sorry about the spam, folks.
> 
> I can use GetSubstructMatch, as follows.
> 
> # sinput is the input smiles
> # scanon is the output smiles
> 
> minput = Chem.MolFromSmiles(sinput)
> scanon=Chem.MolToSmiles(minput)
> mcanon=Chem.MolFromSmiles(scanon)
> map_forward = minput.GetSubstructMatch(mcanon)
> map_backward = mcanon.GetSubstructMatch(minput)
> 
> 
> 
> 
> Ling Chan  於 2021年11月2日週二 下午3:55寫道:
> Dear colleagues,
> 
> Just wonder if I can obtain a mapping of the atom indices upon 
> canonicalization by MolToSmiles ? I am aware that canonicalization (and hence 
> atom reordering) can be suppressed in MolToSmiles, but I do want to 
> canonicalize the output smiles.
> 
> If you are interested, here is a bit more details of my problem. For each 
> molecule, I want to delete one or two side chains, and obtain a smiles of 
> what is left. Just that I want to know what are the atoms that bonded to the 
> deleted side chains. I know, by suppressing canonicalization things will 
> work. But I would like to canonicalize the smiles so that I can know if there 
> are duplicates.
> 
> I tried marking the atoms. But I believe that properties that got carried 
> over to the output smiles, e.g. Isotope, affect the canonicalization, while 
> properties that do not affect canonicalization, e.g, IntProp, are lost upon 
> the conversion to smiles.
> 
> Thank you for your insight.
> 
> Ling
> 



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles atom ordering

2021-11-02 Thread Ling Chan
O.K. Problem solved. Sorry about the spam, folks.

I can use GetSubstructMatch, as follows.

# sinput is the input smiles
# scanon is the output smiles

minput = Chem.MolFromSmiles(sinput)
scanon=Chem.MolToSmiles(minput)
mcanon=Chem.MolFromSmiles(scanon)
map_forward = minput.GetSubstructMatch(mcanon)
map_backward = mcanon.GetSubstructMatch(minput)




Ling Chan  於 2021年11月2日週二 下午3:55寫道:

> Dear colleagues,
>
> Just wonder if I can obtain a mapping of the atom indices upon
> canonicalization by MolToSmiles ? I am aware that canonicalization (and
> hence atom reordering) can be suppressed in MolToSmiles, but I do want to
> canonicalize the output smiles.
>
> If you are interested, here is a bit more details of my problem. For each
> molecule, I want to delete one or two side chains, and obtain a smiles of
> what is left. Just that I want to know what are the atoms that bonded to
> the deleted side chains. I know, by suppressing canonicalization things
> will work. But I would like to canonicalize the smiles so that I can know
> if there are duplicates.
>
> I tried marking the atoms. But I believe that properties that got carried
> over to the output smiles, e.g. Isotope, affect the canonicalization, while
> properties that do not affect canonicalization, e.g, IntProp, are lost upon
> the conversion to smiles.
>
> Thank you for your insight.
>
> Ling
>
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2021-10-21 Thread Andrew Dalke



> On Oct 21, 2021, at 04:50, Ling Chan  wrote:
> 
> I got the attached sdf. When I did a MolToSmiles, it gives me the following.
> 
> >>> for m in Chem.SDMolSupplier("pdb_structures/1q6k_ligand.sdf"):
> ...   print (Chem.MolToSmiles(m))
> ... 
> [CH3:0][C:0]([CH3:0])([CH3:0])[O:0][C:0](=[O:0])[NH:0][CH:0]([CH:0]=[O:0])[CH:0]1[CH2:0][CH2:0][CH2:0][CH2:0][CH2:0]1
> 
> Just wonder why does it not give something like
> O=C(OC(C)(C)C)NC(C=O)C1C1

The terms after the atom symbol in your atom block lines are center-justified 
(or left-justified, in the 2-digit mass difference term 'dd') instead of 
right-justified.

Here's a comparison of your first atom line, compared with the ctfile spec, and 
then compared with the round-trip through RDKit:

   74.0060   -9.5770  134.8660 N  0  0  0  0  0  0  0  0  0  0  0  0<-- 
yours
x.y.z. aaaddcccssshhhbbbvvvHHHrrriiimmmnnneee   <-- spec
   74.0060   -9.5770  134.8660 N   0  0  0  0  0  0  0  0  0  0  0  0   <-- 
RDKit

Add a space after the atom symbol field ("aaa") and everything works.

What happened?

The ":0" in the SMILES string derives from the atom-atom mapping number, "mmm", 
in the SDF.

The relevant code from 
Code/GraphMol/FileParsers/MolFileParser.cpp::ParseMolFileAtomLine() is:

  if (text.size() >= 63 && text.substr(60, 3) != "  0") {
int atomMapNumber = 0;
try {
  atomMapNumber = FileParserUtils::toInt(text.substr(60, 3), true);
} catch (boost::bad_lexical_cast &) {
  std::ostringstream errout;
  errout << "Cannot convert '" << text.substr(60, 3) << "' to int on line "
 << line;
  delete res;
  throw FileParseException(errout.str());
}
res->setProp(common_properties::molAtomMapNumber, atomMapNumber);
  }

This says that if the field isn't exactly "  0" then parse it as an integer and 
store it in the atom's molAtomMapNumber.

Since your " 0 " field isn't exactly "  0", it gets converted into the atom map 
value of 0.

I don't see an explicit statement in the spec about alignment in fields. It's 
clear the spec comes from a Fortran background, so these should be interpreted 
as "I2" and "I3", and right-justified.


By the way, if you pass your file through CDK you get:

org.openscience.cdk.io.MDLV2000Reader ERROR: Error while parsing line 5:
74.0060   -9.5770  134.8660 N  0  0  0  0  0  0  0  0  0  0  0  0   -> invalid 
line length, 68:74.0060   -9.5770  134.8660 N  0  0  0  0  0  0  0  0  0  0 
 0  0
org.openscience.cdk.io.iterator.IteratingSDFReader ERROR: Error while reading 
next molecule: invalid line length, 68:74.0060   -9.5770  134.8660 N  0  0  
0  0  0  0  0  0  0  0  0  0

CDK's 
storage/ctab/src/main/java/org/openscience/cdk/io/MDLV2000Reader.java::readAtomFast()
 requires that either all characters of a field be present, or the end of line. 
Your line is 68 characters long because your last field is " 0" instead of the 
" 0 " needed to match the exact charge flag "eee".

Best regards,


Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread Andrew Dalke
On Nov 18, 2019, at 17:40, David Cosgrove  wrote:
> 
> Point taken. I don’t think you’d be able to get RDKit to spit such SMILES 
> strings out unless you tortured it pretty hard, however. 

Did someone mention one of my favorite things to do? :)  See:

  
http://dalkescientific.com/writings/diary/archive/2010/12/28/reordering_smiles.html

Note that that code does not preserve stereochemistry.

It's for Python 2, so change the:

  available_closures = range(100)

to

 available_closures = list(range(100))

to make it work under Python 3.

Here's what it looks like:

>>> from x import reordered_smiles
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("OCCl")
>>> atoms = list(mol.GetAtoms())
>>> reordered_smiles(mol, [atoms[1], atoms[0], atoms[2]])
'[CH2]12.[OH1]1.[Cl]2'



Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread Rafal Roszak
On Mon, 18 Nov 2019 16:40:28 +
David Cosgrove  wrote:

> Point taken. I don’t think you’d be able to get RDKit to spit such SMILES
> strings out unless you tortured it pretty hard, however.

Export smiles with arbitrary given atom order is diffrent problem.
Normally working with mol object you dont remove any bond, but rather
you change atoms properties (such as isotope, AtomMapNum, explicitHs
and so on). I want to show some simple example but in simple cases
MolToSmiles with rootedAtAtom=0, canonical=False preserve atom order.
I found one example when it didn't work as I expected (atom order was
altered) but it seems I lost this smiles. 
Anyway, is such code:

mol=Chem.MolFromSmiles(someSmilesString)
change_properties_of_some_atoms_in_mol(mol) #this function changes isotopes of 
selected atoms
smiles2 = Chem.MolToSmiles(mol, rootedAtAtom=0, canonical=False)
mol_from_smiles2 = Chem.MolFromSmiles(smiles2)

atom order (or atom indices returned by GetIdx() function) should be the same 
or it can be diffrent?


best,

Rafal


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread David Cosgrove
Hi Rocco,
Point taken. I don’t think you’d be able to get RDKit to spit such SMILES
strings out unless you tortured it pretty hard, however.
Dave


On Mon, 18 Nov 2019 at 16:36, Rocco Moretti  wrote:

> Actually, it is possible to get arbitrary orders, if you (ab)use the '.'
> component ("zero order bond") directive and the numeric bonding ("ring
> closure") directives:
>
> >>> Chem.MolToSmiles( Chem.MolFromSmiles("O1.Cl2.C12" ) )
> 'OCCl'
>
> Whether you want to do things that way is another question.
>
> On Mon, Nov 18, 2019 at 10:24 AM David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
>> Hi Rafal,
>> It is not always possible to preserve the atom ordering in the SMILES
>> string because there is an implied bond between contiguous symbols in the
>> SMILES. I think, for example, that the molecule with the SMILES OCCl
>> couldn’t have the order in the molecule object O first, Cl second, C third,
>> with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order.
>>
>> I hope that made sense. Please ask again if not.
>>
>> Best regards,
>> Dave
>>
>>
>> On Mon, 18 Nov 2019 at 12:33, Rafal Roszak  wrote:
>>
>>> Hi all,
>>>
>>> Is there any way to preserve atom order from Mol object during
>>> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and
>>> canonical=False options but it not always prevent oryginal order.
>>> I know I can use _smilesAtomOutputOrder to map old indices to new one
>>> in canonical smiles but maybe we have something more handy?
>>>
>>> Best,
>>>
>>> Rafał
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread Rocco Moretti
Actually, it is possible to get arbitrary orders, if you (ab)use the '.'
component ("zero order bond") directive and the numeric bonding ("ring
closure") directives:

>>> Chem.MolToSmiles( Chem.MolFromSmiles("O1.Cl2.C12" ) )
'OCCl'

Whether you want to do things that way is another question.

On Mon, Nov 18, 2019 at 10:24 AM David Cosgrove 
wrote:

> Hi Rafal,
> It is not always possible to preserve the atom ordering in the SMILES
> string because there is an implied bond between contiguous symbols in the
> SMILES. I think, for example, that the molecule with the SMILES OCCl
> couldn’t have the order in the molecule object O first, Cl second, C third,
> with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order.
>
> I hope that made sense. Please ask again if not.
>
> Best regards,
> Dave
>
>
> On Mon, 18 Nov 2019 at 12:33, Rafal Roszak  wrote:
>
>> Hi all,
>>
>> Is there any way to preserve atom order from Mol object during
>> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and
>> canonical=False options but it not always prevent oryginal order.
>> I know I can use _smilesAtomOutputOrder to map old indices to new one
>> in canonical smiles but maybe we have something more handy?
>>
>> Best,
>>
>> Rafał
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread David Cosgrove
Hi Rafal,
It is not always possible to preserve the atom ordering in the SMILES
string because there is an implied bond between contiguous symbols in the
SMILES. I think, for example, that the molecule with the SMILES OCCl
couldn’t have the order in the molecule object O first, Cl second, C third,
with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order.

I hope that made sense. Please ask again if not.

Best regards,
Dave


On Mon, 18 Nov 2019 at 12:33, Rafal Roszak  wrote:

> Hi all,
>
> Is there any way to preserve atom order from Mol object during
> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and
> canonical=False options but it not always prevent oryginal order.
> I know I can use _smilesAtomOutputOrder to map old indices to new one
> in canonical smiles but maybe we have something more handy?
>
> Best,
>
> Rafał
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles(), atom indexes

2019-02-01 Thread Jean-Marc Nuzillard

Dear Jose Manuel,

Many thanks for your quick answer and for your script.

All  the best,

Jean-Marc



Le 01/02/2019 à 13:20, Jose Manuel Gally a écrit :


Dear Jean-Marc,

I believe this can be achieved by using the Mol property 
"_smilesAtomOutputOrder", which is set only after using the function 
Chem.MolToSmiles.


Please find attached a very simple example of how it can be extracted.

Cheers,
Jose Manuel

On 01.02.19 13:03, Jean-Marc Nuzillard wrote:

Dear all,

I am looking for a way to relate atom indexes of a Mol object
and the order of appearance of the atoms along the corresponding SMILES
chain, as produced by Chem.MolToSmiles().
Thanks in advance,

Jean-Marc

--
Dr. Jean-Marc Nuzillard
Institute of Molecular Chemistry, CNRS UMR 7312
Faculté des Sciences Exactes et Naturelles, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 33 3 26 91 82 10
Fax : 33 3 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles(), atom indexes

2019-02-01 Thread Jose Manuel Gally

Dear Jean-Marc,

I believe this can be achieved by using the Mol property 
"_smilesAtomOutputOrder", which is set only after using the function 
Chem.MolToSmiles.


Please find attached a very simple example of how it can be extracted.

Cheers,
Jose Manuel

On 01.02.19 13:03, Jean-Marc Nuzillard wrote:

Dear all,

I am looking for a way to relate atom indexes of a Mol object
and the order of appearance of the atoms along the corresponding SMILES
chain, as produced by Chem.MolToSmiles().
Thanks in advance,

Jean-Marc

--
Dr. Jean-Marc Nuzillard
Institute of Molecular Chemistry, CNRS UMR 7312
Faculté des Sciences Exactes et Naturelles, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 33 3 26 91 82 10
Fax : 33 3 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


rdkit_example_smiles_atom_order.ipynb
Description: application/ipynb
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Greg Landrum
I agree with Andrew's suggestion. The optional list argument defaulting to None 
exactly how I would solve it and fits (I think) with at least most of the RDKit.
-greg





On Mon, Dec 19, 2016 at 9:14 PM +0100, "Brian Kelley"  
wrote:










I'm happy to do that as long as there is a consensus.  We could also expose the 
properties in non-string form, but that is a bit harder to do.

  GetPropsAsDict does this, but has the overhead that it does a conversion for 
everything, not just the thing you want.  It does handle the underlying type 
correctly though which is convenient.


Brian Kelley

> On Dec 19, 2016, at 2:59 PM, Andrew Dalke  wrote:
> 
>> On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote:
>> I had thought about making a CanonicalAtomOrder function that does this as 
>> well, or perhaps making a MolToSmiles variant.
> 
> I learned about this function from Noel's blog post at 
> https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom-order/ , 
> which uses the C++ API.
> 
> I would like a variant more along those lines, like:
> 
>  MolToSmiles(mol, isomericSmiles=None,  allHsExplicit=False, 
> atomOrder=None)
> 
> where if I pass in:
> 
>  atomOrder = []
>  MolToSmiles(mol, atomOrder=atomOrder)
> 
> then I get the list of indices in atomOrder, rather than a per-molecule 
> property.
> 
> atomOrder=None can do the existing behavior.
> 
> 
> Cheers,
> 
> 
>Andrew
>da...@dalkescientific.com
> 
> 
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/intel
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Brian Kelley
I'm happy to do that as long as there is a consensus.  We could also expose the 
properties in non-string form, but that is a bit harder to do.

  GetPropsAsDict does this, but has the overhead that it does a conversion for 
everything, not just the thing you want.  It does handle the underlying type 
correctly though which is convenient.


Brian Kelley

> On Dec 19, 2016, at 2:59 PM, Andrew Dalke  wrote:
> 
>> On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote:
>> I had thought about making a CanonicalAtomOrder function that does this as 
>> well, or perhaps making a MolToSmiles variant.
> 
> I learned about this function from Noel's blog post at 
> https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom-order/ , 
> which uses the C++ API.
> 
> I would like a variant more along those lines, like:
> 
>  MolToSmiles(mol, isomericSmiles=None,  allHsExplicit=False, 
> atomOrder=None)
> 
> where if I pass in:
> 
>  atomOrder = []
>  MolToSmiles(mol, atomOrder=atomOrder)
> 
> then I get the list of indices in atomOrder, rather than a per-molecule 
> property.
> 
> atomOrder=None can do the existing behavior.
> 
> 
> Cheers,
> 
> 
>Andrew
>da...@dalkescientific.com
> 
> 
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/intel
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Andrew Dalke
On Dec 19, 2016, at 6:22 PM, Brian Kelley wrote:
> I had thought about making a CanonicalAtomOrder function that does this as 
> well, or perhaps making a MolToSmiles variant.

I learned about this function from Noel's blog post at 
https://nextmovesoftware.com/blog/2013/07/01/accessing-smiles-atom-order/ , 
which uses the C++ API.

I would like a variant more along those lines, like:

  MolToSmiles(mol, isomericSmiles=None,  allHsExplicit=False, 
atomOrder=None)

where if I pass in:

  atomOrder = []
  MolToSmiles(mol, atomOrder=atomOrder)

then I get the list of indices in atomOrder, rather than a per-molecule 
property.

atomOrder=None can do the existing behavior.


Cheers,


Andrew
da...@dalkescientific.com



--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Brian Kelley
I would vote for make a more obvious way to get to these values.  I have
had the need to do this when working with external depictors (i.e. mol ->
smiles -> depict with atom highlighting is one use case)  I just couldn't
think of a valid API way of doing this.  Attaching these values to the
molecule seems like it isn't really the right solution considering there
are two forms of canonical ordering if isomerisms are considered.  I had
thought about making a CanonicalAtomOrder function that does this as well,
or perhaps making a MolToSmiles variant.

Any other ideas?

On Mon, Dec 19, 2016 at 3:58 AM, Greg Landrum 
wrote:

>
> On Mon, Dec 19, 2016 at 9:43 AM, Maciek Wójcikowski  > wrote:
>
>>
>> There is also CanonicalRankAtoms [http://www.rdkit.org/Python_D
>> ocs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to
>> be forgotten.
>>
>
> One thing to be aware of here is that this provides the canonical ranking
> of atoms that is used for the SMILES generation, but the values are not
> equal to the actual output order of the atoms.
> Here's an example of that:
> In [3]: m = Chem.MolFromSmiles('CC(O)CCN')
>
> In [4]: list(Chem.CanonicalRankAtoms(m))
> Out[4]: [0, 5, 2, 4, 3, 1]
>
> In [5]: Chem.MolToSmiles(m)
> Out[5]: 'CC(O)CCN'
>
> In [7]: m.GetProp('_smilesAtomOutputOrder')
> Out[7]: '[0,1,2,3,4,5,]'
>
> so though atom 1 is ranked in position 5, it ends up being the second atom
> output since it is connected to atom 0, which happens to have rank 0.
>
> -greg
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Greg Landrum
On Mon, Dec 19, 2016 at 9:43 AM, Maciek Wójcikowski 
wrote:

>
> There is also CanonicalRankAtoms [http://www.rdkit.org/Python_
> Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to
> be forgotten.
>

One thing to be aware of here is that this provides the canonical ranking
of atoms that is used for the SMILES generation, but the values are not
equal to the actual output order of the atoms.
Here's an example of that:
In [3]: m = Chem.MolFromSmiles('CC(O)CCN')

In [4]: list(Chem.CanonicalRankAtoms(m))
Out[4]: [0, 5, 2, 4, 3, 1]

In [5]: Chem.MolToSmiles(m)
Out[5]: 'CC(O)CCN'

In [7]: m.GetProp('_smilesAtomOutputOrder')
Out[7]: '[0,1,2,3,4,5,]'

so though atom 1 is ranked in position 5, it ends up being the second atom
output since it is connected to atom 0, which happens to have rank 0.

-greg
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-19 Thread Maciek Wójcikowski
Hi Jean-Marc and others,

There is also CanonicalRankAtoms [
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms]
which seams to be forgotten.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-12-18 23:14 GMT+01:00 Jean-Marc Nuzillard :

> Thank you Andrew, Brian and David for your answers.
>
> mol.GetProp("_smilesAtomOutputOrder") does the job.
> I also expected a.GetProp("molAtomMapNumber") could do it for each atom a.
>
> All the best,
>
> Jean-Marc
>
> Le 18/12/2016 à 19:04, Andrew Dalke a écrit :
> > On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote:
> > m.GetProp("_smilesAtomOutputOrder")
> >> '[3,2,1,0,]'
> >>
> >> Note that this returns the list as a string which is sub-optimal.
> GetPropsAsDict will convert these to proper python objects, however, this
> is considered a private member so you need to return these as well:
> >>
> > list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"])
> >> [3, 2, 1, 0]
> > For fun, here are a few timing numbers:
> >
> ># Common setup
> > from rdkit import Chem
> > mol = Chem.MolFromSmiles("c1c1Oc1c1")
> > Chem.MolToSmiles(mol)'
> > import json
> > import ujson # third-party JSON decoder
> > import re
> > integer_pat = re.compile("[0-9]+")
> >
> >
> > # Get the string (give a lower bound)
> > mol.GetProp("_smilesAtomOutputOrder")'
> > 1 loops, best of 3: 31.3 usec per loop
> >
> >
> > Here are variations for how to get that information as a list of
> integers:
> >
> > # Using Python's "eval()" to decode the list (this is generally UNSAFE!)
> > eval(mol.GetProp("_smilesAtomOutputOrder"))'
> > 1 loops, best of 3: 157 usec per loop
> >
> > # Use the built-in json module (need to remove the terminal ",")
> > json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")'
> > 1 loops, best of 3: 66.5 usec per loop
> >
> > # Use the third-party "ujson" package, which is faster than json.
> > ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")
> > 1 loops, best of 3: 41.2 usec per loop
> >
> > ("cjson" takes 49.7 usec per loop)
> >
> > # Use the properties dictionary
> > mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]
> > 1000 loops, best of 3: 462 usec per loop
> >
> > # Parse it more directly
> > map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder")))
> > 1 loops, best of 3: 89 usec per loop
> >
> >
> >   Andrew
> >   da...@dalkescientific.com
> >
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
>
> --
> Jean-Marc Nuzillard
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/ICMR
>
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-18 Thread Jean-Marc Nuzillard
Thank you Andrew, Brian and David for your answers.

mol.GetProp("_smilesAtomOutputOrder") does the job.
I also expected a.GetProp("molAtomMapNumber") could do it for each atom a.

All the best,

Jean-Marc

Le 18/12/2016 à 19:04, Andrew Dalke a écrit :
> On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote:
> m.GetProp("_smilesAtomOutputOrder")
>> '[3,2,1,0,]'
>>
>> Note that this returns the list as a string which is sub-optimal.  
>> GetPropsAsDict will convert these to proper python objects, however, this is 
>> considered a private member so you need to return these as well:
>>
> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"])
>> [3, 2, 1, 0]
> For fun, here are a few timing numbers:
>
># Common setup
> from rdkit import Chem
> mol = Chem.MolFromSmiles("c1c1Oc1c1")
> Chem.MolToSmiles(mol)'
> import json
> import ujson # third-party JSON decoder
> import re
> integer_pat = re.compile("[0-9]+")
>
>
> # Get the string (give a lower bound)
> mol.GetProp("_smilesAtomOutputOrder")'
> 1 loops, best of 3: 31.3 usec per loop
>
>
> Here are variations for how to get that information as a list of integers:
>
> # Using Python's "eval()" to decode the list (this is generally UNSAFE!)
> eval(mol.GetProp("_smilesAtomOutputOrder"))'
> 1 loops, best of 3: 157 usec per loop
>
> # Use the built-in json module (need to remove the terminal ",")
> json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")'
> 1 loops, best of 3: 66.5 usec per loop
>
> # Use the third-party "ujson" package, which is faster than json.
> ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")
> 1 loops, best of 3: 41.2 usec per loop
>
> ("cjson" takes 49.7 usec per loop)
>
> # Use the properties dictionary
> mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]
> 1000 loops, best of 3: 462 usec per loop
>
> # Parse it more directly
> map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder")))
> 1 loops, best of 3: 89 usec per loop
>
>
>   Andrew
>   da...@dalkescientific.com
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
Jean-Marc Nuzillard
Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-18 Thread Andrew Dalke
On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote:
> >>> m.GetProp("_smilesAtomOutputOrder")
> '[3,2,1,0,]'
> 
> Note that this returns the list as a string which is sub-optimal.  
> GetPropsAsDict will convert these to proper python objects, however, this is 
> considered a private member so you need to return these as well:
> 
> >>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"])
> [3, 2, 1, 0]

For fun, here are a few timing numbers:

  # Common setup
from rdkit import Chem
mol = Chem.MolFromSmiles("c1c1Oc1c1")
Chem.MolToSmiles(mol)'
import json
import ujson # third-party JSON decoder
import re
integer_pat = re.compile("[0-9]+")


# Get the string (give a lower bound)
mol.GetProp("_smilesAtomOutputOrder")'
1 loops, best of 3: 31.3 usec per loop


Here are variations for how to get that information as a list of integers:

# Using Python's "eval()" to decode the list (this is generally UNSAFE!)
eval(mol.GetProp("_smilesAtomOutputOrder"))'
1 loops, best of 3: 157 usec per loop

# Use the built-in json module (need to remove the terminal ",")
json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")'
1 loops, best of 3: 66.5 usec per loop

# Use the third-party "ujson" package, which is faster than json.
ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")
1 loops, best of 3: 41.2 usec per loop

("cjson" takes 49.7 usec per loop)

# Use the properties dictionary
mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]
1000 loops, best of 3: 462 usec per loop

# Parse it more directly
map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder")))
1 loops, best of 3: 89 usec per loop


Andrew
da...@dalkescientific.com



--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-18 Thread David Cosgrove
Hi Jean-Marc,

There is a property of the molecule created when it is read that contains
this information.  I forget what it is called, but if you call the
molecule's GetPropNames function you should see something obvious in the
values returned.  You can then call GetProp with that property name to get
a string containing the canonical atom order.  Note that string is a string
representation of the Python list, with '[' at the start, ']' at the end,
and commas in between. You'll need to manipulate it a bit to release the
array of integers you need.

Cheers,
Dave


On Sun, Dec 18, 2016 at 5:19 PM, Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Hi all,
>
> maybe my question has been already been answered:
> when converting from Mol to a canonical SMILES string,
> is there a way to obtain the mapping between the atom indexes in the
> Mol object and the atom indexes in the SMILES chain?
>
> All the best,
>
> Jean-Marc
>
> --
>
> Dr. Jean-Marc Nuzillard
> Institute of Molecular Chemistry
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 33 3 26 91 82 10
> Fax :33 3 26 91 31 66
> http://www.univ-reims.fr/ICMR
>
> http://eos.univ-reims.fr/LSD/
> http://eos.univ-reims.fr/LSD/JmnSoft/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles

2016-12-18 Thread Brian Kelley
Jean-Marc,
  This is very non-obvious, but here is how you can do it from python:

>>> from rdkit import Chem

>>> m = Chem.MolFromSmiles("NCCC")

>>> Chem.MolToSmiles(m)

'CCCN'

>>> m.GetProp("_smilesAtomOutputOrder")

'[3,2,1,0,]'



Note that this returns the list as a string which is sub-optimal.
GetPropsAsDict will convert these to proper python objects, however, this
is considered a private member so you need to return these as well:

>>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"])

[3, 2, 1, 0]


I'm converting to a list here to show the output, this is really a wrapped
vector but it can be used as a sequence.  Hope this helps.  Note that you
can just dump out the dictionary for any object with SetProp:

>>> m.GetPropsAsDict(True,True)

{'_smilesAtomOutputOrder': ,
'numArom': 0, '_StereochemDone': 1, '__computedProps':
}

And see some of how the sausage is made inside.

Cheers,
 Brian

On Sun, Dec 18, 2016 at 12:19 PM, Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Hi all,
>
> maybe my question has been already been answered:
> when converting from Mol to a canonical SMILES string,
> is there a way to obtain the mapping between the atom indexes in the
> Mol object and the atom indexes in the SMILES chain?
>
> All the best,
>
> Jean-Marc
>
> --
>
> Dr. Jean-Marc Nuzillard
> Institute of Molecular Chemistry
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 33 3 26 91 82 10
> Fax :33 3 26 91 31 66
> http://www.univ-reims.fr/ICMR
>
> http://eos.univ-reims.fr/LSD/
> http://eos.univ-reims.fr/LSD/JmnSoft/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss