Re: [Rdkit-discuss] SVG depiction with fonts?

2021-09-30 Thread Greg Landrum
Hi Geoff,

you need to disable the use of freetype when you create the MolDraw2DSVG
object.
Unfortunately there are not keyword arguments for this (something for me to
fix ASAP), but you can do this as follows:
d = Draw.MolDraw2DSVG(350,300,-1,-1,False)

That last "False" turns off FreeType and uses normal SVG text.

I hope this helps,
-greg


On Tue, Sep 28, 2021 at 7:10 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> Hi all,
>
> I recently upgraded to RDKit 2021.3 from the March 2020 version. With last
> year's release, I was able to tweak the generated SVG depictions to replace
> characters (e.g., where we used "*" in a SMILES but really wanted "M" for a
> metal center) or change the font-weight and font-size.
>
> svg.replace("font-weight:normal", "font-weight:bold")
>
> Now it seems as if the characters are turned into strokes. Is there an
> option to turn this off and go back to SVG characters with font-weight
> attributes?
>
> Thanks,
> -Geoff
>
> ---
> Prof. Geoffrey Hutchison
> Department of Chemistry
> University of Pittsburgh
> tel: (412) 648-0492
> email: geo...@pitt.edu
> twitter: @ghutchis
> web: https://hutchison.chem.pitt.edu/
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] What is the most efficient way to check for exact match with RDKit?

2021-09-30 Thread Giovanni Tricarico
Hello Theo,

in my experience, something like approach 3 is quite safe (not sure how 
computationally efficient in rdkit, but I suppose you'd rather have an accurate 
slow method than a fast often wrong one, right?).

In short:

In the greatest majority of cases, the literal string match of inchikey means 
two molecules are 'the same'.

When the two inchikey's do NOT match, it does not mean the molecules are 'not 
the same', it depends on how the inchikey is calculated (see further down).



For some time we used the inchikey (not inchi string) + chirality flag as a 
~unique identifier of a molecule. If you start from a SMILES rather than CTAB 
you don't need to worry about the chirality flag.

Someone calculated how likely it is that two different molecules give the same 
inchikey, and it seems it's extremely rare.

There could be a problem if you started to look at huge combinatorial sets of 
billions of molecules, where even that very rare occurrence might materialise.



The SMILES comparison, even canonical SMILES, may often fail due to different 
tautomeric forms.

And indeed, the important part is how the inchikey is calculated.

The software we use (Biovia) 'knows' about the most important tautomers, like 
pyrazole, 2- and 4- pyridones / hydroxypyridines, etc., so for instance:



[cid:image002.png@01D7B5DC.15DF3060]



Obviously if the inchikey calculation in rdkit missed that, two different 
representations of 'the same' molecule would give different inchikeys, just as 
well as it would give different canonical SMILES.

I have not yet looked at rdkit's inchikey calculator; perhaps you already know 
about these aspects.



But this is a very subtle point.

Tautomers are not like resonance structures that interconvert by movement of 
electrons alone, they are really formally distinct molecules, which only 
interconvert by moving atoms around (in the above example, a hydrogen atom).

So in a way, the SMILES is 'correct' in saying that these are two different 
molecules.

It is only because we know from chemistry that the interconversion between them 
is fast, and that an equilibrium is reached in the media we are usually 
interested in, that we consider them 'the same'.



I hope this helps.



Giovanni



-Original Message-
From: theozh 
Sent: 30 September 2021 08:44
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] What is the most efficient way to check for exact 
match with RDKit?



*** CAUTION : External e-mail ***





Dear all,



it looks like a simple/stupid question... but I haven't found (or overlooked) 
an example the RDKit cookbook.



What is the intended (and most efficient) way in RDKit to search for identity, 
i.e. exact match?



I asked this question already here: 
https://stackoverflow.com/questions/60211666/rdkit-how-to-check-molecules-for-exact-match

and got some answers, but maybe from the RDKit mailing-list audience there 
might be other (more efficient) solutions?



Assumption: SMILES A and SMILES B.



Approach 1: If A is a substructure of B and B is a substructure of A then the 
structures are identical.



Approach 2: Create Canonical SMILES of A and B and do a string comparison.



Approach 3: (not sure whether this will work) Creating InChI of A and B. Would 
a simple string comparison work here as well?



So, if I have a given list of structures, I could once generate a Canonical 
SMILES list (or maybe InChI list?) and do a simple string comparison.

Would this be the most efficient way to check if a certain structure is in the 
list?



Thank you for any comments, hints, suggestions.

Theo.





___

Rdkit-discuss mailing list

Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
This e-mail and its attachment(s) (if any) may contain confidential and/or 
proprietary information and is intended for its addressee(s) only. Any 
unauthorized use of the information contained herein (including, but not 
limited to, alteration, reproduction, communication, distribution or any other 
form of dissemination) is strictly prohibited. If you are not the intended 
addressee, please notify the originator promptly and delete this e-mail and its 
attachment(s) (if any) subsequently. 

Neither Galapagos nor any of its affiliates shall be liable for direct, 
special, indirect or consequential damages arising from alteration of the 
contents of this message (by a third party) or as a result of a virus being 
passed on.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] What is the most efficient way to check for exact match with RDKit?

2021-09-30 Thread theozh

Dear all,

it looks like a simple/stupid question... but I haven't found (or overlooked) 
an example the RDKit cookbook.

What is the intended (and most efficient) way in RDKit to search for identity, 
i.e. exact match?

I asked this question already here: 
https://stackoverflow.com/questions/60211666/rdkit-how-to-check-molecules-for-exact-match
and got some answers, but maybe from the RDKit mailing-list audience there 
might be other (more efficient) solutions?

Assumption: SMILES A and SMILES B.

Approach 1: If A is a substructure of B and B is a substructure of A then the 
structures are identical.

Approach 2: Create Canonical SMILES of A and B and do a string comparison.

Approach 3: (not sure whether this will work)
Creating InChI of A and B. Would a simple string comparison work here as well?

So, if I have a given list of structures, I could once generate a Canonical 
SMILES list (or maybe InChI list?)
and do a simple string comparison.
Would this be the most efficient way to check if a certain structure is in the 
list?

Thank you for any comments, hints, suggestions.
Theo.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss