[Rdkit-discuss] How to perform a "mild" 2D Clean

2018-11-02 Thread Good Eats
In many chemical drawing programs, there exists a "2D Clean" function. This
function usually has two tiers of cleaning. The first tier clean is mild:
standardizing bond lengths and bond angles, but leaving the general
conformation intact. The second tier is stronger: completely recomputing
the 2D coordinates from scratch.

In RDKit, Generate2DCoords() does an admirable job at performing the strong
version of the above. However, I haven't found anything that can perform
the mild 2D Clean. What I've examined so far:
1. TransformMol(): this can scale the structure to get bond lengths close
to the desired length, but doesn't help for molecules with bonds that have
different lengths -- the asymmetry is preserved after the scaling.
2. Iterating through all bonds and calling SetBondLength() to a constant
value: this fails for bonds that are in rings.
3. Iterating through atoms and calling SetAngleDeg() to standardize bond
angles: I haven't tried this yet -- it is next on my list. However, even if
this works, it won't fix the bond length part of the clean.

Any other suggestions?
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Plotting values next to atoms

2018-11-02 Thread Greg Landrum
Hi Eric,

On Fri, Nov 2, 2018 at 2:00 PM Eric Jonas  wrote:

> Hello! I'm trying to figure out if there's any known or sane way to
> automatically plot numerical values adjacent to atoms using the rdkit
> drawing machinery. Ideally I'd like to annotate certain atoms
> programmatically with values. I think the conventional way this is done for
> publication is post-hoc editing in illustrator but it would be great if
> there was an automatic or supported mechanism.
>

Doing this correctly is on the list of high-priority things to do, and I
really hope to have something done for the 2019.03 release, but there's no
way I can guarantee that (it's a hard problem).

In the meantime, there's a way to at least do something that is, hopefully,
better than nothing:
https://gist.github.com/greglandrum/8cf8ecc3253abf0a5139a776a5095163
displayed here:
https://nbviewer.jupyter.org/gist/greglandrum/8cf8ecc3253abf0a5139a776a5095163
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] svg: next question

2018-11-02 Thread Dimitri Maziuk via Rdkit-discuss
On 11/02/2018 12:19 AM, Greg Landrum wrote:
> On Fri, Nov 2, 2018 at 12:32 AM Dimitri Maziuk via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
> 
>> Does anyone know where TH does
>>
>> 
>>
>> come from? --
> 
> 
> assuming you're using the RDKit's MolDraw2DSVG class, that comes from here:
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/MolDraw2D/MolDraw2DSVG.cpp#L53

Should it be changed to utf-8? I suspect any system where RDKit builds
at this point is using that, and I believe technically  element
can contain unicode.

E.g. you should be able to render your amino-acids with atoms labeled w/
Greek alphas, betas, etc. as per IUPAC.

>> I have two SVGs generated by the same container running on
>> the same linux host and one has the above, the other has
>>
>> 
>>
> 
> No idea where that might have come from, but it's not MolDraw2DSVG

Weird. I'll see if I get any more of those...

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Plotting values next to atoms

2018-11-02 Thread Dimitri Maziuk via Rdkit-discuss
On 11/02/2018 07:59 AM, Eric Jonas wrote:
> Hello! I'm trying to figure out if there's any known or sane way to
> automatically plot numerical values adjacent to atoms using the rdkit
> drawing machinery. Ideally I'd like to annotate certain atoms
> programmatically with values.

This draws atom labels:

op = dr.drawOptions()
for i in range( self._mol.GetNumAtoms() ) :
op.atomLabels[i] = self._mol.GetAtomWithIdx( i ).GetSymbol() +
str( (i + 1) )

HTH,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fail one case adding functional groups to certain atomic index of core

2018-11-02 Thread Noki Lee
Thanks for the tip!
Hi, Greg
I will check it another for cases. SetNoImplicit and SetNoExplicit are
very confused ones. What this makes it soluble?

Best,
Noki.

On Fri, Nov 2, 2018 at 1:01 PM Greg Landrum  wrote:

> You're going to reach the limit of what this simple approach can do can do
> very quickly. :-)
> When you express the sulfur atom as [SH] in the SMILES, you tell the RDKit
> that it must have an H attached. That information is preserved when you
> connect the S to the other molecule, so you end up with a valence that's
> too high.
>
> I added a function combine2 to the gist (
> https://gist.github.com/greglandrum/fd488309268cb085be218f26178e13b8)
> that can handle this case,
>
> On Thu, Nov 1, 2018 at 9:33 AM Noki Lee  wrote:
>
>> Hi, Greg
>>
>> Recently, I got the code producing a combined molecule from one core and
>> several functional groups:
>> https://gist.github.com/greglandrum/fd488309268cb085be218f26178e13b8
>>
>> Here is the exception case that I encountered.
>>
>> core = Chem.MolFromSmiles('N(=N/c1c1)\c2c2')
>> pieces = [Chem.MolFromSmiles(x) for x in ('C(=O)O','O=[SH](=O)O')]
>> connections = ((7,0),(4,1))
>> newMol = combine(core,pieces,connections)
>> Draw.MolToImage(core).show()
>> Draw.MolToImage(pieces[0]).show()
>> Draw.MolToImage(pieces[1]).show()
>> Draw.MolToImage(newMol).show()
>> print(Chem.MolToSmiles(newMol))
>>
>> I tried using all canonicalized smiles. I worked for 'O=S(=O)O' which is
>> not canonicalized one.
>> If I put 'O=[SH](=O)O', it works. But it's not what I wanted. The tail of
>> H(hydrogen) tags along the S atom.
>> I tried [S-] instead [SH], but it is also not what I expected. S in the
>> result of 'combine' function has a negative charge. Can you suggest
>> something?
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Plotting values next to atoms

2018-11-02 Thread Eric Jonas
Hello! I'm trying to figure out if there's any known or sane way to
automatically plot numerical values adjacent to atoms using the rdkit
drawing machinery. Ideally I'd like to annotate certain atoms
programmatically with values. I think the conventional way this is done for
publication is post-hoc editing in illustrator but it would be great if
there was an automatic or supported mechanism.

...Eric
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sometimes one sanitization is not enough?

2018-11-02 Thread Francis Atkinson
Interesting timing! I have just come across this exact same issue when 
experimenting with new SMARTS-based standardisations (so, yes, treating 
unnatural molecules in unwholesome ways).


I 'fixed' it by calling SanitzeMol twice.

On 31/10/2018 14:48, Ivan Tubert-Brohman wrote:

Hi Greg,

Thanks for the detailed explanation. You are right that this is not a 
real molecule; it came from applying a user-supplied reaction SMARTS. 
(The reaction SMARTS was not the best-written perhaps, but that's 
tangential...). I normally sanitize the products and skip those that 
fail the sanitization, but in this case I was surprised when the 
sanitized molecule caused issues later while trying to compute 
descriptors.


I look forward to a fix, but in the meantime maybe I'll consider 
running SanitzeMol twice. :-)


Best,
Ivan


On Wed, Oct 31, 2018 at 2:41 AM Greg Landrum > wrote:


Hi Ivan,

Short answer: I would not normally expect a second sanitization to
fail if the first succeeds, but your input SMILES is very odd and
triggers a bug.

This is an interesting edge case for the sanitization code because
it includes a weird mix of aromatic and aliphatic atoms and bonds,
I do hope this came out of some computational process and isn't a
"real" molecule. You almost couldn't have picked a better example
to highlight the situation that's causing the problem here. Some
form of congratulations are in order. :-)

Here's an explanation of what's going on with your molecule
C1=n(C)-c=Cn1
The fundamental problem is that atom 1 (the first nitrogen) has a
valence of 4 and is neutral...
If you wrote the SMILES as C1=N(C)C=CN1, which is what the
sanitization process produces, I don't think you'd be surprised
that the RDKit sanitization fails (and your second call to
sanitize does fail).

To understand why it passes the first time, you need to understand
the flow of the sanitization process, described here;
https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
Step 3, updatePropertyCache(), is the part that reports valency
errors. There's a special case in this code for aromatic atoms
that allows atoms like the N in Cn11 to pass sanitization even
though they are formally four-valent (2x1.5 for the aromatic
bondsĀ +1 for the C). Your molecule is triggering that special case
because atom 1 is aromatic in the input SMILES. Incorrect aromatic
rings that get through this step normally end up getting caught
later when the molecule is kekulized (step 5). In your case there
are no aromatic bonds to kekulize, so no error is thrown. The
aromaticity perception (step 6) does not consider the ring to be
aromatic, so the final molecule is the equivalent of C1=N(C)C=CN1.

It ought to be possible to clear this in the sanitization code
relatively easily; I just need to think about it a bit and do a
bunch of testing.

-greg








On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman
mailto:ivan.tubert-broh...@schrodinger.com>> wrote:

Hi,

I was surprised to see that a (dubious) structure that goes
through SanitizeMol OK can fail a subsequent sanitization call:

print("Start")
mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
print("Before first sanitization")
Chem.SanitizeMol(mol)
print("Before second sanitization")
Chem.SanitizeMol(mol)
print("Done")


The output is:

Start
Before first sanitization
Before second sanitization
[16:54:20] Explicit valence for atom # 1 N, 4, is greater
than permitted
Traceback (most recent call last):
File "./san.py", line 9, in 
Chem.SanitizeMol(mol)
ValueError: Sanitization error: Explicit valence for atom
# 1 N, 4, is greater than permitted


Is this an unavoidable aspect of the way SanitizeMol works,
since it does several operations (Kekulize, check valencies,
set aromaticity, conjugation and hybridization) in a certain
order, or should this be considered a bug?

Best,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Dr Francis L Atkinson

Chemogenomics Group
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

(01223) 494473


[Rdkit-discuss] duplicate checks for organometallics

2018-11-02 Thread Malgorzata Werner
Hi there,

I was looking for a way to standardize structures of organometallics so I can 
match them across different databases.



One example is cisplatin which has different Smiles representations in 
different databases, e.g.:

  *   Drugbank (represented as covalent bonds): N[Pt](N)(Cl)Cl
  *   PubChem (represented as both ionic and covalent bonds): N.N.Cl[Pt]Cl

If I just calculate the Inchikey based on those Smiles strings, obviously they 
are different.

To standardize the structures, I came up with this solution:

  1.  Convert the rdkit mol to an Inchi string (disconnects metal covalent 
bonds)
  2.  Convert the Inchi string back to a molecule. For the above molecules, I 
get:

  *   Drugbank: [Cl-].[Cl-].[NH2-].[NH2-].[Pt+4]
  *   PubChem: N.N.[Cl-].[Cl-].[Pt+2]

  1.  Set all formal charges to zero and calculate the Inchikey, which is then 
identical.
Unfortunately, the last step is a bit brute force, so all charges in the 
molecule are lost. Could anyone think of a better solution?

Thanks,
Malgorzata
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss