Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Paul Emsley


On 27/08/2020 20:15, Jason Biggs wrote:
Everything I know about C++ I learned just so that I can write a link 
between an interpreted language and the rdkit, so there are definitely 
some gaps in my knowledge.


What I'm trying to understand right now is the expected lifetime of an 
Atom pointer returned by a molecule, for instance by the 
getAtomWithIdx method.  Based on the documentation, since this method 
doesn't say the user is responsible for deleting the returned pointer 
I know I'm not supposed to delete it. But when exactly does it get 
deleted? If I dereference it after deleting the molecule, what is it?


auto mol = RDKit::SmilesToMol("");
auto atom = mol->getAtomWithIdx(0);
auto m2 = atom->getOwningMol();
std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
delete mol;
std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
delete atom; // seg fault

I would have thought the first time dereferencing the atom pointer 
after deleting mol would have crashed, but it does not.  I would also 
have expected bad things when calling the getNumAtoms method on m2 
after calling delete on mol, but this also works just fine.  What am I 
missing?




Isn't this the soft of undefined behaviour that one would expect when 
accessing deleted memory? Try adding some code between the deletion of 
mol and the access of atom that allocates and deallocs some memory for a 
second or so.


Anyway, I wouldn't try to "out-clever" the RDKit by deleting molecules 
"by hand."



Paul.




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Jason Biggs
On Thu, Aug 27, 2020 at 4:33 PM David Cosgrove 
wrote:

> Hi Jason,
> The answer is that when you delete the molecule, the memory it uses is
> flagged as available for re-use,  but nothing else happens to it. If you
> then de-reference pointers to it, such as the atoms that are buried in the
> block of memory allocated to the molecule, you may get away with it and you
> may not. It will depend on whether something else has written over the
> memory or not. In your example, the memory was still in its original state,
> so the de-referencing of the atom pointers succeeded. This is not
> guaranteed, however, and this sort of bug is generally very nasty to find-
> sometimes the code will run, sometimes it will crash. Worse still is if you
> accidentally write to de-allocated memory that something else is now using-
> you can then get failures 5 minutes later in a completely different part of
> the program.
>

Thank you David, this really helps.  Thank you also to Dan, Nils, and
Dima.

I knew that accessing that atom after deleting the molecule was bad juju,
and was confused why it worked. I will steer clear of undefined behavior.

In my application, I have a wrapper class that I expose to top level users,
which holds a unique pointer to an ROMol. I know then that when my wrapper
class member goes away so does the ROMol.  What I don't have is a similar
wrapper class for an Atom, precisely because of these ownership questions -
any atom properties or modifications go through the ROMol wrapper class.

I'm not very familiar with how the python interface works, is there a
similar issue with the python wrappers?  Does the wrapper class for the
Atom clean up after itself differently if the atom is marked as having an
owner?  Hope that question isn't too vague

Jason


>
> Deleting the atoms is also an error, because they will be deleted by the
> molecule’s destructor, so you’ll be de-allocating the memory twice, another
> exciting source of undefined behaviour. Valgrind is excellent for tracking
> down these sorts of error, and many more besides.  If you’re developing on
> Linux, it’s good practice to use it on any code before you use that program
> in earnest.
>
> Cheers,
> Dave
>
>
> On Thu, 27 Aug 2020 at 20:17, Jason Biggs  wrote:
>
>> Everything I know about C++ I learned just so that I can write a link
>> between an interpreted language and the rdkit, so there are definitely some
>> gaps in my knowledge.
>>
>> What I'm trying to understand right now is the expected lifetime of an
>> Atom pointer returned by a molecule, for instance by the getAtomWithIdx
>> method.  Based on the documentation, since this method doesn't say the user
>> is responsible for deleting the returned pointer I know I'm not supposed to
>> delete it. But when exactly does it get deleted?  If I dereference it after
>> deleting the molecule, what is it?
>>
>> auto mol = RDKit::SmilesToMol("");
>> auto atom = mol->getAtomWithIdx(0);
>> auto m2 = atom->getOwningMol();
>> std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
>> delete mol;
>> std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
>> std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
>> delete atom; // seg fault
>>
>> I would have thought the first time dereferencing the atom pointer after
>> deleting mol would have crashed, but it does not.  I would also have
>> expected bad things when calling the getNumAtoms method on m2 after calling
>> delete on mol, but this also works just fine.  What am I missing?
>>
>> Thanks
>> Jason
>>
>>
>> ___
>>
>> Rdkit-discuss mailing list
>>
>> Rdkit-discuss@lists.sourceforge.net
>>
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D coord for hydrogens

2020-08-27 Thread Paolo Tosco
Hi Mark,

further to Fio's reply, I think the confusion stems from the fact that when
you call AddHs() after MolFromSmiles() no coordinates are actually
generated; these are only generated on the fly for visualization, and hence
are correct before and after AddHs(), as the layout is always generated
on-the-fly from scratch, rather than incrementally.
Instead, when you explicitly call rdCoordGen.AddCoords() (or
rdDepictor.Compute2DCoords()) a conformation is actually added, thus
allowing incremental addition of coordinates when you call AddHs().

I hope this gist clarifies the above:
https://gist.github.com/ptosco/1c7c83017cd1a3d8aa64fe336ae3c1f8

Cheers,
p.

On Thu, Aug 27, 2020 at 8:25 PM Mark Mackey  wrote:

> Hi all,
>
>
>
> I’m trying to generate a 2D layout with sensible locations for hydrogens,
> and am struggling a bit. If I start from SMILES:
>
>
>
> m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
>
> m2=rdmolops.AddHs(m, False, True);
>
>
>
> Then m2 has perfectly sensible 2D cords for the hydrogens. If I use
> addCoords=False here then the hydrogens are all at the origin, which is
> more-or-less expected.
>
>
>
> If, however, I generate new 2D coords for the heavy atoms with CoordGen:
>
>
>
> m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
>
> Chem.rdCoordGen.AddCoords(m)
>
> m2=rdmolops.AddHs(m, False, True);
>
>
>
> then the 2D coords for the hydrogens are poor on the 5-membered ring, and
> the two hydrogens on the NH2 are superimposed.
>
>
>
> (In practise I’m starting from a 3D SDF and trying to generate 2D coords
> for it, hence the need for CoordGen).
>
>
>
> What am I doing wrong?
>
>
>
> Regards,
>
> Mark
>
>
>
> *-- *
>
> *Mark Mackey*
>
> *Chief Scientific Officer*
>
> *Cresset*
>
> New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8
> 0SS, UK
>
> tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223
> 853667
>
> email: *m...@cresset-group.com *web:
> www.cresset-group.comskype: mark_cresset
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Dan Nealschneider
In RDKit, atoms are owned by the molecule. When you ask for:

auto atom = mol->getAtomWithIdx(0);


You are asking for a pointer to memory internally owned by the
RDKit::ROMol. However:

auto mol = RDKit::SmilesToMol("");


creates a new molecule in memory, so it's your job to delete it. This is
documented here:
https://github.com/rdkit/rdkit/blob/e86e2c1d5d375c75cbd7e00871ecc1e0a29b3548/Code/GraphMol/SmilesParse/SmilesParse.h#L47.
I think that, in general, RDKit tends to document when you *do* need to
clean up after yourself.

I'd recommend one of these idioms:

ROMOL_SPTR mol1(RDKit::SmilesToMol("")); // this is a
boost::shared_ptr, requires #include 
std::unique_ptr mol2(RDKit::SmilesToMol("")); // requires
#include 



*dan nealschneider* | lead developer
[image: Schrodinger Logo] 


On Thu, Aug 27, 2020 at 1:33 PM dmaziuk via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> On 8/27/2020 3:06 PM, Nils Weskamp wrote:
> > To add to this: you are looking at the wonderful concept of an
> > "undefined behavior" in C/C++. There is no guarantee that your example
> > program will always show the same behaviour.
> >
> > In more recent versions of C++, you have access to "smart pointers" like
> > std::shared_ptr, which basically implement reference counting. Not sure
> > if this would help here.
>
> It's worse: with all the boost junk they pulled in the really recent
> versions, good luck figuring out which calls pass "smart" pointers and
> which don't.
>
> There are reasons why everyone's into Rust, and the efforts of C++
> Standards Committee are behind many of them.
>
> Dima
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread David Cosgrove
Hi Jason,
The answer is that when you delete the molecule, the memory it uses is
flagged as available for re-use,  but nothing else happens to it. If you
then de-reference pointers to it, such as the atoms that are buried in the
block of memory allocated to the molecule, you may get away with it and you
may not. It will depend on whether something else has written over the
memory or not. In your example, the memory was still in its original state,
so the de-referencing of the atom pointers succeeded. This is not
guaranteed, however, and this sort of bug is generally very nasty to find-
sometimes the code will run, sometimes it will crash. Worse still is if you
accidentally write to de-allocated memory that something else is now using-
you can then get failures 5 minutes later in a completely different part of
the program.

Deleting the atoms is also an error, because they will be deleted by the
molecule’s destructor, so you’ll be de-allocating the memory twice, another
exciting source of undefined behaviour. Valgrind is excellent for tracking
down these sorts of error, and many more besides.  If you’re developing on
Linux, it’s good practice to use it on any code before you use that program
in earnest.

Cheers,
Dave


On Thu, 27 Aug 2020 at 20:17, Jason Biggs  wrote:

> Everything I know about C++ I learned just so that I can write a link
> between an interpreted language and the rdkit, so there are definitely some
> gaps in my knowledge.
>
> What I'm trying to understand right now is the expected lifetime of an
> Atom pointer returned by a molecule, for instance by the getAtomWithIdx
> method.  Based on the documentation, since this method doesn't say the user
> is responsible for deleting the returned pointer I know I'm not supposed to
> delete it. But when exactly does it get deleted?  If I dereference it after
> deleting the molecule, what is it?
>
> auto mol = RDKit::SmilesToMol("");
> auto atom = mol->getAtomWithIdx(0);
> auto m2 = atom->getOwningMol();
> std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
> delete mol;
> std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
> std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
> delete atom; // seg fault
>
> I would have thought the first time dereferencing the atom pointer after
> deleting mol would have crashed, but it does not.  I would also have
> expected bad things when calling the getNumAtoms method on m2 after calling
> delete on mol, but this also works just fine.  What am I missing?
>
> Thanks
> Jason
>
>
> ___
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread dmaziuk via Rdkit-discuss

On 8/27/2020 3:06 PM, Nils Weskamp wrote:

To add to this: you are looking at the wonderful concept of an
"undefined behavior" in C/C++. There is no guarantee that your example
program will always show the same behaviour.

In more recent versions of C++, you have access to "smart pointers" like
std::shared_ptr, which basically implement reference counting. Not sure
if this would help here.


It's worse: with all the boost junk they pulled in the really recent 
versions, good luck figuring out which calls pass "smart" pointers and 
which don't.


There are reasons why everyone's into Rust, and the efforts of C++ 
Standards Committee are behind many of them.


Dima


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to set rdMolStandardize.CleanupParameters.maxTautomer for tautomer canonicalization

2020-08-27 Thread Fiorella Ruggiu
Hi Paolo!

Very helpful to know - I missed this in my google search. Thank you very
much for your prompt answer and for implementing these changes :).

Best,
Fio

On Thu, Aug 27, 2020 at 12:30 PM Paolo Tosco 
wrote:

> Hi Fio,
>
> there is an open PR that addresses this and other issues with the
> TautomerEnumerator:
> https://github.com/rdkit/rdkit/pull/3327
>
> As soon as it will be merged in the main trunk this functionality will be
> available.
>
> Hope that helps, cheers
> p.
>
> On Thu, Aug 27, 2020 at 9:08 PM Fiorella Ruggiu 
> wrote:
>
>> Hi everyone,
>>
>> I am cleaning up molecules to import into our database and canonicalizing
>> the tautomer using rdkit in python. Some cases result in a time-out and do
>> not go into my except to be caught. I would like to try setting the
>> maxTautomer to lower than 1000. I found there was no direct option to set
>> this in either
>> rdkit.Chem.MolStandardize.rdMolStandardize.TautomerEnumator or the
>> Canonicalize() fct and the
>> rdkit.Chem.MolStandardize.rdMolStandardize.CleanupParameters.maxTautomers 
>> needs
>> to be set.
>>
>> How can I set the max number of tautomers to generate in the canonicalize
>> function?
>>
>> Thank you for your help!
>> Best,
>> Fio
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Nils Weskamp
To add to this: you are looking at the wonderful concept of an
"undefined behavior" in C/C++. There is no guarantee that your example
program will always show the same behaviour.

In more recent versions of C++, you have access to "smart pointers" like
std::shared_ptr, which basically implement reference counting. Not sure
if this would help here.

Best regards,
Nils

Am 27.08.2020 um 21:42 schrieb dmaziuk via Rdkit-discuss:
> On 8/27/2020 2:15 PM, Jason Biggs wrote:
> 
>> What I'm trying to understand right now is the expected lifetime of an
>> Atom
>> pointer returned by a molecule, for instance by the getAtomWithIdx
>> method.
>> Based on the documentation, since this method doesn't say the user is
>> responsible for deleting the returned pointer I know I'm not supposed to
>> delete it. But when exactly does it get deleted?  If I dereference it
>> after
>> deleting the molecule, what is it?
> 
> The more general answer is:
> 
> a) when the program terminates, all its resources are returned to the
> OS. It was a common CGI technique to not bother and just let the it run
> to the end. (It was also one of the "mobile Java" things with cellphone
> vendors: they wanted garbage collection off.)
> 
> b) Unlike "garbage-collected" languages c++ has guaranteed object
> destruction. If there's any resources you want to explicitly relinquish,
> the destructor is the place to do it.
> 
> If your program is not an "up forever" server, you could just let it be:
> it'll all get cleaned up on exit.
> 
> HTH,
> Dima
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread dmaziuk via Rdkit-discuss

On 8/27/2020 2:15 PM, Jason Biggs wrote:


What I'm trying to understand right now is the expected lifetime of an Atom
pointer returned by a molecule, for instance by the getAtomWithIdx method.
Based on the documentation, since this method doesn't say the user is
responsible for deleting the returned pointer I know I'm not supposed to
delete it. But when exactly does it get deleted?  If I dereference it after
deleting the molecule, what is it?


The more general answer is:

a) when the program terminates, all its resources are returned to the 
OS. It was a common CGI technique to not bother and just let the it run 
to the end. (It was also one of the "mobile Java" things with cellphone 
vendors: they wanted garbage collection off.)


b) Unlike "garbage-collected" languages c++ has guaranteed object 
destruction. If there's any resources you want to explicitly relinquish, 
the destructor is the place to do it.


If your program is not an "up forever" server, you could just let it be: 
it'll all get cleaned up on exit.


HTH,
Dima


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to set rdMolStandardize.CleanupParameters.maxTautomer for tautomer canonicalization

2020-08-27 Thread Paolo Tosco
Hi Fio,

there is an open PR that addresses this and other issues with the
TautomerEnumerator:
https://github.com/rdkit/rdkit/pull/3327

As soon as it will be merged in the main trunk this functionality will be
available.

Hope that helps, cheers
p.

On Thu, Aug 27, 2020 at 9:08 PM Fiorella Ruggiu 
wrote:

> Hi everyone,
>
> I am cleaning up molecules to import into our database and canonicalizing
> the tautomer using rdkit in python. Some cases result in a time-out and do
> not go into my except to be caught. I would like to try setting the
> maxTautomer to lower than 1000. I found there was no direct option to set
> this in either rdkit.Chem.MolStandardize.rdMolStandardize.TautomerEnumator
> or the Canonicalize() fct and the
> rdkit.Chem.MolStandardize.rdMolStandardize.CleanupParameters.maxTautomers 
> needs
> to be set.
>
> How can I set the max number of tautomers to generate in the canonicalize
> function?
>
> Thank you for your help!
> Best,
> Fio
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Jason Biggs
Everything I know about C++ I learned just so that I can write a link
between an interpreted language and the rdkit, so there are definitely some
gaps in my knowledge.

What I'm trying to understand right now is the expected lifetime of an Atom
pointer returned by a molecule, for instance by the getAtomWithIdx method.
Based on the documentation, since this method doesn't say the user is
responsible for deleting the returned pointer I know I'm not supposed to
delete it. But when exactly does it get deleted?  If I dereference it after
deleting the molecule, what is it?

auto mol = RDKit::SmilesToMol("");
auto atom = mol->getAtomWithIdx(0);
auto m2 = atom->getOwningMol();
std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
delete mol;
std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
delete atom; // seg fault

I would have thought the first time dereferencing the atom pointer after
deleting mol would have crashed, but it does not.  I would also have
expected bad things when calling the getNumAtoms method on m2 after calling
delete on mol, but this also works just fine.  What am I missing?

Thanks
Jason
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D coord for hydrogens

2020-08-27 Thread Fiorella Ruggiu
Hi Mark,

it's best to add the hydrogens first and then set your coordinates:

m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
m2=rdmolops.AddHs(m, False, True);
Chem.rdCoordGen.AddCoords(m2)

Best,
Fio

On Thu, Aug 27, 2020 at 11:25 AM Mark Mackey  wrote:

> Hi all,
>
>
>
> I’m trying to generate a 2D layout with sensible locations for hydrogens,
> and am struggling a bit. If I start from SMILES:
>
>
>
> m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
>
> m2=rdmolops.AddHs(m, False, True);
>
>
>
> Then m2 has perfectly sensible 2D cords for the hydrogens. If I use
> addCoords=False here then the hydrogens are all at the origin, which is
> more-or-less expected.
>
>
>
> If, however, I generate new 2D coords for the heavy atoms with CoordGen:
>
>
>
> m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
>
> Chem.rdCoordGen.AddCoords(m)
>
> m2=rdmolops.AddHs(m, False, True);
>
>
>
> then the 2D coords for the hydrogens are poor on the 5-membered ring, and
> the two hydrogens on the NH2 are superimposed.
>
>
>
> (In practise I’m starting from a 3D SDF and trying to generate 2D coords
> for it, hence the need for CoordGen).
>
>
>
> What am I doing wrong?
>
>
>
> Regards,
>
> Mark
>
>
>
> *-- *
>
> *Mark Mackey*
>
> *Chief Scientific Officer*
>
> *Cresset*
>
> New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8
> 0SS, UK
>
> tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223
> 853667
>
> email: *m...@cresset-group.com *web:
> www.cresset-group.comskype: mark_cresset
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to set rdMolStandardize.CleanupParameters.maxTautomer for tautomer canonicalization

2020-08-27 Thread Fiorella Ruggiu
Hi everyone,

I am cleaning up molecules to import into our database and canonicalizing
the tautomer using rdkit in python. Some cases result in a time-out and do
not go into my except to be caught. I would like to try setting the
maxTautomer to lower than 1000. I found there was no direct option to set
this in either rdkit.Chem.MolStandardize.rdMolStandardize.TautomerEnumator
or the Canonicalize() fct and the
rdkit.Chem.MolStandardize.rdMolStandardize.CleanupParameters.maxTautomers needs
to be set.

How can I set the max number of tautomers to generate in the canonicalize
function?

Thank you for your help!
Best,
Fio
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2D coord for hydrogens

2020-08-27 Thread Mark Mackey
Hi all,

I'm trying to generate a 2D layout with sensible locations for hydrogens, and 
am struggling a bit. If I start from SMILES:

m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
m2=rdmolops.AddHs(m, False, True);

Then m2 has perfectly sensible 2D cords for the hydrogens. If I use 
addCoords=False here then the hydrogens are all at the origin, which is 
more-or-less expected.

If, however, I generate new 2D coords for the heavy atoms with CoordGen:

m=Chem.MolFromSmiles('Nc1nnc2n1CCS2')
Chem.rdCoordGen.AddCoords(m)
m2=rdmolops.AddHs(m, False, True);

then the 2D coords for the hydrogens are poor on the 5-membered ring, and the 
two hydrogens on the NH2 are superimposed.

(In practise I'm starting from a 3D SDF and trying to generate 2D coords for 
it, hence the need for CoordGen).

What am I doing wrong?

Regards,
Mark

--
Mark Mackey
Chief Scientific Officer
Cresset
New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK
tel: +44 (0)1223 858890mobile: +44 (0)7595 099165fax: +44 (0)1223 853667
email: m...@cresset-group.comweb: 
www.cresset-group.comskype: mark_cresset

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] EmbedMolecule and chirality

2020-08-27 Thread Tim Dudgeon
I've encountered a strange problem when doing constrained embedding that
seems to be related to chirality.
The example is here:
https://gist.github.com/tdudgeon/c4604f3ee9124eeec60668b5eefe465e

The molecule being embedded has a single chiral carbon, and while that atom
is tethered, only one of it's attached atoms is also tethered, so if I
understand correctly, there should be no reason why the chiral atom cannot
be embedded correctly.

But when I use the enforceChirality=True option the embedding usually
fails, and takes several seconds to do so. Very occasionally it succeeds
and the chirality is preserved as expected.

If however I use the enforceChirality=False option the embedding
succeeds and is very quick. In all cases I've run so far the
stereochemistry is preserved.

Can someone explain this behaviour?

Tim
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Valence coding in atom block of SDF files written by RDKit

2020-08-27 Thread Jean-Marc Nuzillard

Dear Paolo,

many thanks for this regex-based answer to my question.

Best,

Jean-Marc


Le 26/08/2020 à 23:34, Paolo Tosco a écrit :

Hi Jean-Marc,

You can strip the valence field from the MolBlock with a regex:

import re

regex = re.compile(r"^(\s*\d+\.\d{4}\s*\d+\.\d{4}\s*\d+\.\d{4} ... \d 
 \d  \d  \d  \d  )(\d)(.*)$")
print("\n".join(regex.sub(r"\g<1>0\g<3>", ...: line) for line in 
Chem.MolToMolBlock(Chem.    ...: MolFromSmiles("[NH4+]")).split("\n")))

     RDKit          2D

  1  0  0  0  0  0  0  0  0  0999 V2000
    0.    0.    0. N   0  0  0  0  0  0  0  0  0  0  0  0
M  CHG  1   1   1
M  END

HTH, cheers
p.

On Wed, Aug 26, 2020 at 5:00 PM Jean-Marc Nuzillard 
mailto:jm.nuzill...@univ-reims.fr>> wrote:


Dear all,

the atom block of the Connection Table produced by RDKit
(Chem.MolToMolBlock() function)
from the '[NH4+]' SMILES chain is
  0.    0.    0. N   0  0  0  0  0  4  0  0 0  0  0

in which the '4' in column 10 indicates that the number of bonds
of the
N atoms (implicit Hs included) is 4.
This makes sense but may be not necessary because the electric charge
information is brought by the
"M  CHG  1   1   1" line.
My ctfile.pdf file (by accelrys, dated 2011, any more recent version
around?) shows the example of alanine
in zwitterionic form and the valence column in the atom block only
contains '0' values.
Having this '4' for any organic ammonium ion is misinterpreted by the
ACDLabs software
I use for NMR chemical shift prediction. Replacing the '4' by a '0'
solves the problem.

Apart editing by myself the atom block (by hand or by scripting), is
there a way
to keep the value of the valence field to 0 for electrically charged
atoms when writing sdf files?

Best,

Jean-Marc


-- 
Jean-Marc Nuzillard

Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html


http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/icmr
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss