Re: [Rdkit-discuss] Is there a Smiles library for common amino acids and ligands that can be used for AssignBondOrdersFromTemplate

2023-10-27 Thread Rocco Moretti
I'll note that the official definitions for all the chemical entities in
the PDB can be found in the wwPDB's Chemical Component Dictionary:
https://www.wwpdb.org/data/ccd

That's in mmCIF format, but there are various SMILES and InChI definitions
for the residues included in the file. (Your mileage may vary for the
quality of those representations, though, especially for the rarer ones,
but it should be no worse than the SDFs.)

You should be able to use an mmCIF parser to extract them.

e.g.
from mmcif.core.mmciflib import ParseCifSimple  # py-mmcif from the RCSB:
`pip install mmcif`
ccd = ParseCifSimple("components.cif", True, 0, 255, "?", "logfile.txt") #
logfile.txt is an arbitrary name

ALA = ccd.GetBlock("ALA")
desc = ALA.GetTable("pdbx_chem_comp_descriptor")
print( desc.GetColumnNames() )
for ii in range(desc.GetNumRows()):
print( desc.GetRow(ii) )

*['comp_id', 'type', 'program', 'program_version', 'descriptor']*






*['ALA', 'SMILES', 'ACDLabs', '10.04', 'O=C(O)C(N)C']['ALA',
'SMILES_CANONICAL', 'CACTVS', '3.341', 'C[C@H](N)C(O)=O']['ALA', 'SMILES',
'CACTVS', '3.341', 'C[CH](N)C(O)=O']['ALA', 'SMILES_CANONICAL', 'OpenEye
OEToolkits', '1.5.0', 'C[C@@H](C(=O)O)N']['ALA', 'SMILES', 'OpenEye
OEToolkits', '1.5.0', 'CC(C(=O)O)N']['ALA', 'InChI', 'InChI', '1.03',
'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1']['ALA',
'InChIKey', 'InChI', '1.03', 'QNAYBMKLOCPYGJ-REOHCLBHSA-N']*

The components file is rather large, so parsing time might be a little long
at times.

On Fri, Oct 27, 2023 at 10:55 AM He, Amy 
wrote:

> Dear RDKit experts,
>
>
>
> I need your advice on finding a source Smiles library for reference, to
> build the template molecule from Smiles for AssignBondOrdersFromTemplate
> .
>
>
>
> I am using AssignBondOrdersFromTemplate to perceive bonds in a
> residue-wise manner from an input PDB, using a reference Smiles library
> like this:
>
>
>
> ref_smi = {
>
>
>
> "ALA": "NC(C)C(=O)",
>
> "GLY": "NCC(=O)",
>
> "ILE": "NC(C(C)CC)C(=O)",
>
>
>
> }
>
>
> I wonder if there has been an open reference library for common amino
> acids and ligands that present in PDB files. A previous post on
> rdkit-discuss (
> https://rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception)
> points me to this website:
>
> ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz
>
> and useful links from
>
> http://www.ebi.ac.uk/pdbe-srv/pdbechem/
>
>
>
> But I am no longer able to access the contents.
>
>
>
> I guess we could always generate Smiles from the standardized SDF files..
> Still I am wondering if there is an existing Smiles library (like a
> reference datafile), where we can retrieve the Smiles string using the
> residue names of common amino acids and maybe also ligands.
>
>
>
> Any comments or suggestions would be greatly appreciated. Thank you for
> your time and kind support in advance!
>
>
>
>
>
> Bests,
>
>
>
>
>
> --
>
> Amy He
>
> Chemistry Graduate Teaching Assistant
>
> Hadad Lab
>
> Ohio State University
>
> he.1...@osu.edu
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Changes in morgan fingerprint code?

2023-01-13 Thread Rocco Moretti
>
> Just as an FYI: the best easy way, by far, to keep track of whether or not
> you've seen a particular molecule is to use the SMILES.
>

Though as a caveat with SMILES, be aware of issues about partial chirality
and E/Z isomerization specification. "CC=CC(C)(O)CF" is not the same SMILES
as "C/C=C/[C@](C)(O)CF", even though they might refer to the "same"
molecule for your purposes. RDKit canonical SMILES will faithfully render
the stereochemistry information if available, but depending on how you're
reading and/or processing things, you may or may not have that info
properly annotated for the SMILES outputter to use. (Something as simple as
generating 3D coordinates can potentially add that info in. But also, just
because your SDF file has 3D coordinates doesn't necessarily guarantee that
RDKit will completely annotate stereochemical info on the read-in Mol.)

Take a look at `Chem.AssignStereochemistryFrom3D()`,
`Chem.RemoveStereochemistry()` and
`Chem.EnumerateStereoisomers.EnumerateStereoisomers()` if this is
potentially going to be an issue for you.

On Fri, Jan 13, 2023 at 1:41 AM Greg Landrum  wrote:

> Hi Eric,
>
> That would be due to the fix for this bug:
> https://github.com/rdkit/rdkit/issues/5036
> If you were generating the fingerprints on "normal" (i.e.
> hydrogen-suppressed) graphs, you wouldn't notice this one, but the fact
> that you add the Hs before generating the fingerprint causes you to notice
> it.
>
> Just as an FYI: the best easy way, by far, to keep track of whether or not
> you've seen a particular molecule is to use the SMILES.
>
> -greg
>
>
> On Fri, Jan 13, 2023 at 6:27 AM Eric Jonas  wrote:
>
>> Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to
>> keep track of different molecules, but now I realize it might have been too
>> quick and dirty! In particular, there appears to have been a change in the
>> morgan code sometime between 2021.09.02 and 2022.03.05. The following code
>> produces different output under these versions:
>>
>> import rdkit.Chem
>> import pickle
>> from rdkit import Chem
>>
>> import rdkit.Chem.rdMolDescriptors
>> import zlib
>>
>> def get_morgan4_crc32(m):
>> mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4)
>> morgan4_crc32 = zlib.crc32(mf.ToBinary())
>> return morgan4_crc32
>>
>> mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1'))
>> print(get_morgan4_crc32(mol))
>>
>> 2021.09.2 : 1567135676
>> 2022.03.5 : 204854560
>>
>> I tried looking at the release notes but I didn't seem to see any
>> breaking changes (I might have missed them!) and I tried looking at "blame"
>> for the relevant source but didn't see any seemingly-substantive changes
>> within the relevant timeframe.
>>
>> So am I doing something crazy here, or did something change deliberately,
>> or is it possible this is a bug?
>>
>> ...E
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] use cases for weighted sampling of a compound library

2022-12-11 Thread Rocco Moretti
The use case for this sort of thing which immediately springs to mind would
be decoy selection. That is, you have a known set of "positives" and want
to find a set of "negatives"/"background" which match those compounds in
some set of properties. DUD-E is probably the
most well-known example of this, but it's an approach which has been tried
numerous times on both a formal and an ad hoc basis.

These days where you'll see people attempting it would be with machine
learning, to try to get around the "positive/unlabeled" nature of most
small molecule datasets. That is, often it's easy to get sets of "positive"
compounds from the literature/etc., but trying to get a set of known
negative compounds is sometimes difficult. People attempt to find synthetic
negatives by using matched-property decoy selection from an external
compound set. However, the literature on this is ... less than flattering.
It turns out that even if you're careful, your negative selection method
can still be biased and the ML method can pick up on this (see, e.g.
https://doi.org/10.1021/acs.jcim.8b00712) -- this is similar to the tales
of image recognition software which uses the presence of grass to tell if
something is a cow or not, or which fails because all the pictures of one
kind of tank were taken on a cloudy day. ML is very good at picking up such
small differences, even if you don't know what those actually are.

On Sun, Dec 11, 2022 at 11:25 AM Christopher Mayer-Bacon 
wrote:

> Hello all,
>
> I’m starting a project that explores the sampling of a large compound
> library.  My question is not so much about how to do something, but rather
> the specific use cases for weighted sampling from a compound library.
>
> Given a large compound library and a smaller, reference library, I want to
> take random samples from the large library such that the samples resemble
> the reference library in some way.  At the moment I’m focused on element
> composition (% of carbon atoms, % of oxygen atoms, etc.), but I’m open to
> using other features in the future.
>
> I have an idea of how to perform this sampling; my question for this
> community concerns a possible use case.  What would be the benefit of
> sampling from a compound library such that the samples resemble another
> library in some way?  I can think of a use case for my specific research
> niche (adaptive properties of the canonical amino acid alphabet), but I
> can’t think of another potential use case.  I know the RDKit community has
> a wide variety of backgrounds and expertise, hence why I wanted to pose
> this question to you all.
>
> -Chris
>
> --
> -Christopher Mayer-Bacon (*he/him/his*)
> PhD student
> Department of Biological Sciences
> University of Maryland, Baltimore County
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Working with SDF from varying locales?

2022-09-30 Thread Rocco Moretti
Hi Greg,

> The RDKit doesn't normally convert data field values into floats unless
you explicitly ask it to

I did notice that mol.GetProp() will always return things by string, and
you would need to use mol.GetDoubleProp() if you explicitly wanted a
numeric value, but it looks like mol.GetPropsAsDict() will automatically
convert to integers/floating point as appropriate. I guess I was wondering
if there was a way to get GetPropsAsDict() to be more gregarious with the
locale (and/or make GetDoubleProp() more robust to not raising an
exception).

But if I need to handle the locale re-parsing on my own, I can probably
knock something together to do that.

Luckily the CTAB section in my files are all the same C locale, so I don't
have to worry about that headache.

Thanks,
Rocco

On Fri, Sep 30, 2022 at 9:21 AM Greg Landrum  wrote:

> Hi Rocco,
>
> Paolo already replied about the options available for python when
> interpreting the data fields from an SDF. The RDKit doesn't normally
> convert data field values into floats unless you explicitly ask it to, so
> this would be fine to do from Python
>
> The CTAB part of the SDF, which includes the coordinates, always parses
> the coordinates using the C locale (regardless of what the current locale
> on the machine is)... this is more or less part of the CTAB spec from MDL.
>
> -greg
>
>
> On Thu, Sep 29, 2022 at 8:16 PM Rocco Moretti 
> wrote:
>
>> Hello,
>>
>> I have a number of SDFs of molecules with associated data blocks. (That
>> is, the `>` section that comes after `M END` and before ``.)
>>
>> The problem I have is that these SDFs were generated in different
>> countries, and have different locales -- most notably, some of them use "."
>> as the decimal separator for real-valued properties and some use ",".  To
>> make things even more fun, some use a mix of both, depending on who
>> calculated which properties where.
>>
>> Is there any facility in RDKit for reading in such locale-varying SDF
>> files and normalizing them?
>>
>> Thanks,
>> Rocco
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Working with SDF from varying locales?

2022-09-29 Thread Rocco Moretti
Hello,

I have a number of SDFs of molecules with associated data blocks. (That is,
the `>` section that comes after `M END` and before ``.)

The problem I have is that these SDFs were generated in different
countries, and have different locales -- most notably, some of them use "."
as the decimal separator for real-valued properties and some use ",".  To
make things even more fun, some use a mix of both, depending on who
calculated which properties where.

Is there any facility in RDKit for reading in such locale-varying SDF files
and normalizing them?

Thanks,
Rocco
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] molecule layout to optimise available space

2022-02-11 Thread Rocco Moretti
If the issue is a 2D coordinate issue (rather than a drawing layout one),
is there a way to do post-coordgen "bond rotation" in 2D coordinate space?
(That is, flip the cis/trans choice for certain bonds).

I could imagine an algorithm which samples the "2D rotatable" bonds on
Tim's structure, then figures out that by flipping the N-phenyl bond (and
potentially others) it makes the molecule suitably long and thin.

On Fri, Feb 11, 2022 at 9:24 AM Tim Dudgeon  wrote:

> Greg,
> Thanks for the clarification.
>
> On Fri, Feb 11, 2022 at 3:19 PM Greg Landrum 
> wrote:
>
>> Oh, you want the layout itself changed, not just the orientation.
>>
>> No, there's nothing in place to do that and adding such a thing would be
>> extremely non-trivial.
>>
>> -greg
>>
>>
>> On Fri, Feb 11, 2022 at 3:49 PM Tim Dudgeon 
>> wrote:
>>
>>> Hi Greg,
>>> yes, but my situation is that the X dimension is much larger than the Y
>>> and most of the time things are aligned nicely. But not always. Here is an
>>> example.
>>> OC(C(=O)NC=1C=CC=CC1NS(=O)(=O)C=2C=CC(F)=CC2)C=3C=CC=NC3
>>> [image: image.png]
>>> Clearly there is potential to lay this out using more of the X dimension
>>> and less of the Y.
>>>
>>> Tim
>>>
>>> On Fri, Feb 11, 2022 at 1:57 PM Greg Landrum 
>>> wrote:
>>>
 Hi Tim,

 That's a nice one.

 For people not familiar with the problem:
 The RDKit coordinate generation prefers aligning molecules with the X
 axis; this can lead to "sub-optimal" drawings if your drawing canvas is
 taller than it is wide.

 One easy solution is to just generate coordinates as usual and then
 rotate them to favor the Y axis if your canvas is larger along Y.
 Here's a gist showing how to do that:
 https://gist.github.com/greglandrum/12b793b240d27e3c0899c9c6c62d4f30

 -greg


 On Fri, Feb 11, 2022 at 10:20 AM Tim Dudgeon 
 wrote:

> At Dave Cosgrove's suggestion I raise this as a new topic, though it
> was touched on briefly recently.
>
> I'd like to know if it's possible to depict a molecule in a way that
> takes into account the dimensions of the box it will appear in. In my case
> I have a rectangle that is short and wide (aspect ratio of 1:3) and the
> molecules are typically compressed because of the lack of available 
> height.
> So is it possible to make the layout engine aware of the bounds it has
> available (e.g. in my example short and wide)?
>
> Thanks
> Tim
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
 ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] explicit H atoms

2021-03-08 Thread Rocco Moretti
On Mon, Mar 8, 2021 at 11:17 AM Paul Emsley 
wrote:

> On 08/03/2021 13:55, Jean-Marc Nuzillard wrote:
> >
> > Is it always possible to represent an organic molecule in 2D with all
> necessary
> > configuration hints (bond wedges pointing to the front or to the back)
> > without introducing any explicit hydrogen atom?
>
> No. Testosterone.
>

Is that "not possible" or simply "against convention"? One could certainly
imagine someone attempting to put the dashed and wedged designations on the
ring bonds, and leaving the hydrogens implicit. (Flagrantly ignoring how
much it would mess with steroid chemists' brains.)
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] TIL: Mol objects having varying attributes depending on rdkit imports

2020-09-23 Thread Rocco Moretti
>
> Python translates object.method() to method(object).


Well, yes and no. "Yes" in the sense that instance methods are internally
implemented equivalently to a free method which takes an instance as the
first parameter. "No" in the sense that from a namespace and user
perspective there typically isn't a crossover:

>>> class TestClass:
def __init__(self, name):
self.name = name
def say(self):
print("I'm TestClass",self.name)

>>> def recite(test_class):
test_class.say()

>>> t = TestClass("Bob")
>>> t.say()
I'm TestClass Bob
>>> recite(t)
I'm TestClass Bob
>>> t.recite()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'TestClass' object has no attribute 'recite'

(As the `recite` function is in the local namespace, not in the TestClass
namespace, `t.recite()` can't find it.)

While, due to the equivalence between free functions and member functions,
you can certainly inject such a free function into the class:

>>> TestClass.recite = recite
>>> t.recite()
I'm TestClass Bob

such an injection isn't universally what one sees in typical Python
programs, as anyone who's tried to do a `mylist.len()` can attest. Doing
such an injection dependent on module imports is much rarer, and certainly
not the expected "standard" behavior in Python. (It's certainly not
automatic in Python, if that was what you were trying to imply.)

Regards,
-Rocco
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] TIL: Mol objects having varying attributes depending on rdkit imports

2020-09-23 Thread Rocco Moretti
Hi Norwid,

There's a subtle but significant difference between the two examples:

>>> AllChem.Compute2DCoords(m)
versus
>>> m.Compute2DCoords()

For the former, it's pretty standard Python behavior not to be able to see
a function from a module if you haven't loaded the module yet. That's
expected behavior, and something you'll learn early on when working with
Python modules.

For the latter, it's not standard Python behavior to have methods which
aren't visible until some other module is loaded. Generally, if you have an
object of a class, you have access to all the methods of that class. Just
having part of the class and then needing to import a separate module to
get the rest of the methods is certainly not something you typically expect
in Python.

Regards,
-Rocco


On Wed, Sep 23, 2020 at 5:35 AM Norwid Behrnd via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi Thomas,
>
> could your report be already backed by the section titled «Working with
> 2D molecules: Generating Depictions» of the upper half of page
>
> https://www.rdkit.org/docs/GettingStartedInPython.htm
>
> about the 2020.03.1 documentation with the following example?
>
>  8>< begin snippet --- 
> >>> m = Chem.MolFromSmiles('c1nccc2n1ccc2')
> >>> AllChem.Compute2DCoords(m)
> 0
>  8>< end snippet --- 
>
> Because this snippet is part of a show case, a minimal working example
> (at least for a bit old RDKit 2019.9.1) translates into
>
>  8>< begin snippet --- 
> from rdkit import Chem
> from rdkit.Chem import AllChem
> m = Chem.MolFromSmiles('c1c1')
> AllChem.Compute2DCoords(m)
>  8>< end snippet --- 
>
> to yield "0" (zero).
>
> However, possibly contributing to your struggle, note an entry on page
>
> https://www.rdkit.org/docs/GettingStartedInPython.html#chem-vs-allchem
>
> with the snippet
>
>  8>< begin snippet --- 
> >>> from rdkit.Chem import AllChem as Chem
> >>> m = Chem.MolFromSmiles('CCC')
>  8>< end snippet --- 
>
> equivalent to a MWE of
>
>  8>< begin snippet --- 
> import rdkit
> from rdkit.Chem import AllChem as Chem
> m = Chem.MolFromSmiles('CCC')
> Chem.Compute2DCoords(m)
>  8>< end snippet --- 
>
> to equally yield "0" (zero).
>
> I only recall this part of the manual because one of my yesterday's
> problems caused me to revisit the beginner's page again.
>
> Norwid
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread Rocco Moretti
Actually, it is possible to get arbitrary orders, if you (ab)use the '.'
component ("zero order bond") directive and the numeric bonding ("ring
closure") directives:

>>> Chem.MolToSmiles( Chem.MolFromSmiles("O1.Cl2.C12" ) )
'OCCl'

Whether you want to do things that way is another question.

On Mon, Nov 18, 2019 at 10:24 AM David Cosgrove 
wrote:

> Hi Rafal,
> It is not always possible to preserve the atom ordering in the SMILES
> string because there is an implied bond between contiguous symbols in the
> SMILES. I think, for example, that the molecule with the SMILES OCCl
> couldn’t have the order in the molecule object O first, Cl second, C third,
> with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order.
>
> I hope that made sense. Please ask again if not.
>
> Best regards,
> Dave
>
>
> On Mon, 18 Nov 2019 at 12:33, Rafal Roszak  wrote:
>
>> Hi all,
>>
>> Is there any way to preserve atom order from Mol object during
>> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and
>> canonical=False options but it not always prevent oryginal order.
>> I know I can use _smilesAtomOutputOrder to map old indices to new one
>> in canonical smiles but maybe we have something more handy?
>>
>> Best,
>>
>> Rafał
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Xenon atoms have hydrogen added

2019-09-04 Thread Rocco Moretti
I think the issue is that you're making an explicit bond to a Xenon atom,
and Xenon's valence model in the RDKit says that it has either zero bond or
it has two bonds. (Don't worry - it's not really something you should have
known *a priori *- valence models are funky.
)


list( Chem.GetPeriodicTable().GetValenceList("Xe") ) # Returns [0, 2]

Since you have at least one bond to Xenon (to the carbon), you can't have
zero bonds, so you must have two bonds, so RDKit fills in the missing
valence with an implicit hydrogen:

atom.GetTotalValence() # returns 2
atom.GetNumImplicitHs() # returns 1

The hydrogen is implicit, so removing the hydrogens with Chem.RemoveHs()
won't do anything.

This then interacts with the Smiles code. The Smiles model says that if you
have an atom in brackets (which Xenon always is), you need to explicitly
record the hydrogens it has. (See here

for more.)

Ways around it: The easiest would be if you could change your element to
something which takes a single valence, or something that doesn't have
valences for implicit hydrogen purposes. (Astatine is a decent choice for
the former, many of the the actinides work well for the latter.) If you
really do want to use Xenon, you can always manually flag the atom to not
have any implicit hydrogens.

*...*
xe = Chem.Atom(54) # 54 is Xenon
*xe.SetNoImplicit(True)*
idx = mw.AddAtom(xe)
mw.AddBond(0,6,Chem.BondType.SINGLE)
Chem.SanitizeMol(mw)
atom = mw.GetAtomWithIdx(idx)
atom.GetExplicitValence() # returns 1
atom.GetTotalValence() *# returns 1*
atom.GetNumImplicitHs() *# returns 0*
Chem.MolToSmiles(mw) *# returns '[Xe]c1c1'*

On Wed, Sep 4, 2019 at 9:35 AM Tim Dudgeon  wrote:

> I'm finding that if I add a Xenon atom to a molecule it seems to get an
> unwanted hydrogen added to it.
> Example notebook here:
> https://gist.github.com/tdudgeon/ba3497341d9de95b4d78f3e5ed9fc0f7
>
> Basic code is like this:
>
> from rdkit import Chem
> m = Chem.MolFromSmiles("c1c1")
> mw = Chem.RWMol(m)
> xe = Chem.Atom(54) # 54 is Xenon
> idx = mw.AddAtom(xe)
> mw.AddBond(0,6,Chem.BondType.SINGLE)
> Chem.SanitizeMol(mw)
> atom = mw.GetAtomWithIdx(idx)
> atom.GetExplicitValence() # returns 1
> Chem.MolToSmiles(mw) # returns [XeH]c1c1, expecting [Xe]c1c1
>
> Even if I do a Chem.RemoveHs() the H remains.
>
> Any ideas why and how to fix?
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reducing space in image rendered using Draw.MolsToGridImage

2018-11-26 Thread Rocco Moretti
I've recently had similar concerns.

The issue that I've had is that MolsToGridImage() sets a single scale, and
uses that scale for all the molecules (such that a phenyl ring is the same
size for all your molecules). This is great when you're displaying a set of
related compounds, where having everything on the same scale helps to
identify commonalities and differences between the different molecules.
It's less useful when you're looking at a wide variety of arbitrary
molecules, of potentially different scales, where the larger ones force the
smaller ones to be vanishingly small.

My solution was to write the following function, based on the internals of
MolsToGridImage()


from rdkit.Chem import AllChem
import rdkit.Chem.Draw
from rdkit.Chem.Draw import rdMolDraw2D
try:
import Image
except ImportError:
from PIL import Image
from io import BytesIO

def DrawMolsZoomed(mols, molsPerRow=3, subImgSize=(200, 200)):
nRows = len(mols) // molsPerRow
if len(mols) % molsPerRow: nRows += 1
fullSize = (molsPerRow * subImgSize[0], nRows * subImgSize[1])
full_image = Image.new('RGBA', fullSize )
for ii, mol in enumerate(mols):
if mol.GetNumConformers() == 0:
AllChem.Compute2DCoords(mol)
column = ii % molsPerRow
row = ii // molsPerRow
offset = ( column*subImgSize[0], row * subImgSize[1] )
d2d = rdMolDraw2D.MolDraw2DCairo(subImgSize[0], subImgSize[1])
d2d.DrawMolecule(mol)
d2d.FinishDrawing()
sub = Image.open(BytesIO(d2d.GetDrawingText()))
full_image.paste(sub,box=offset)
return full_image




On Mon, Nov 26, 2018 at 4:52 PM Sundar  wrote:

> How to reduce the space between each row of images while using
> Draw.MolsToGridImage to render an image?
> Currently the default space is unnecessarily big.
>
> --
> Thanks,
> Sundar
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Draw.MolToImage upside down

2018-09-20 Thread Rocco Moretti
Instead of arbitrarily generating 2D coordinates (which can have any
orientation), you probably want to use
GenerateDepictionMatching2DStructure() to generate coordinates which match
a common orientation for a substructure.

See http://www.rdkit.org/docs/GettingStartedInPython.html#drawing-molecules
for an example of how to use it.

Note that this does require you to have a common core with which to align
the depictions, but some of that can be automated with FindMCS (see
http://www.rdkit.org/docs/GettingStartedInPython.html#maximum-common-substructure
).

On Thu, Sep 20, 2018 at 4:54 PM, Jason Klima  wrote:

> Hi,
> I would like to visualize the following resonance structures in the same
> orientation (i.e. the same atomic coordinates), however, the image on the
> right appears to be drawn upside down compared to the image on the left,
> while the element names (e.g. N, O, and F) in both images are right side
> up. Mirroring the image on the right about a horizontal axis can be
> accomplished with ax.set_ylim([0,300]) however the element names are then
> upside down. How can I mirror the drawing on the right about a horizontal
> axis without mirroring element names?
>
> from rdkit import Chem
> from rdkit.Chem import Draw
> import matplotlib.pyplot as plt
> def draw_structure(smiles_string, ax=None):
> m = Chem.MolFromSmiles(smiles_string, sanitize=False)
> m.UpdatePropertyCache()
> Chem.SetHybridization(m)
> # ax.set_ylim([0,300]) # Flips image but element names also flip
> return ax.imshow(Draw.MolToImage(m), interpolation='bessel')
> fig = plt.figure(figsize=(10,5))
> ax1 = plt.subplot2grid((1, 2), (0, 0))
> ax2 = plt.subplot2grid((1, 2), (0, 1))
> draw_structure('CN(C2=O)C(C)=N\C2=C/C1=CC(F)=C([o-])C(F)=C1', ax=ax1)
> draw_structure('CN(C=2[o-])C(C)=N\C2C=C1/C=C(F)C(=O)C(F)=C1', ax=ax2)
> plt.tight_layout()
> plt.show()
>
> Output:
> [image: image.png]
>
> Thanks,
> Jason
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromInchi with Amides

2018-06-15 Thread Rocco Moretti
Is there an easy way from within RDKit to take an arbitrary amide tautomer
and convert it to the "correct" (according to chemists) one?

On Fri, Jun 15, 2018 at 12:26 AM, Markus Sitzmann  wrote:

> Hi Jeff,
>
> That is because InChI is a structure identifier, not a structure
> representation. The difference of both is, a structure identifier
> normalizes the structure to a form which it regards as the standard
> representation of the molecule in order to make the molecule identifiable
> regardless of the state the molecule is coming in from a input resource
> (and hence calculates the same identifier).
>
> For Standard InChI, the decision was made to make them insensitive to
> tautomers (within the limitations of the InChI algorithm). Kind of
> unluckily, this normalizes most amides to a form that chemists regard as
> the incorrect one. And the second unlucky thing is that you can convert the
> InChI back to a structure representation which then  is of course the
> normalized or standardized form of the molecule.
>
> So if you want to make sure to keep the original representation of a
> molecule don’t use InChI as your representation format (calculate InChI as
> an identifier field next to it). If your input resource only provides InChI
> or Standard InChI then your are of course out of luck.
>
> Best,
> Markus
>
> -
> |  Markus Sitzmann
> |  markus.sitzm...@gmail.com
>
> On 14. Jun 2018, at 23:33, Jeff van Santen 
> wrote:
>
> Hi all,
>
>
> I have some questions about how remit handles amides. For context, I am
> working with a large set of molecules, many of which contain peptides. I
> have been running into a problem with using rdkit, in that when I try to
> load a molecule from the InChI, the wrong tautomer is loaded. As a simple
> example consider acetamide:
>
>
> """
>
> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
>  > 0
>
> print(Chem.MolToSmiles(FromInchi))
>
> > CC(=N)O
>
>
> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 1
>
> print(Chem.MolToSmiles(FromSmiles))
>
> > CC(=N)O
>
> """
>
>
> I realize that Standard InChi does not have a mechanism for distinguishing
> between the two tautomers, so I am wondering why rdkit considers the iminol
> to be a better representation? Also, there is anyway to get the amide
> instead? (Without using MolVS)
>
>
> Thanks,
>
> Jeff
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] looking for feedback on new python API documentation format

2017-03-29 Thread Rocco Moretti
One potential issue with the new pdoc documentation is how it handles (or
rather doesn't) namespace transclusions.

For example, say you were looking for documentation on
rdkit.Chem.MolFromMolFile. If you go to the rdkit.Chem page for pdoc (
http://rdkit.org/docs_temp/Chem/index.html) MolFromMolFile is not listed
anywhere on that page.

In contrast, for the old version (
http://rdkit.org/docs/api/rdkit.Chem-module.html) a link to the
MolFromMolFile documentation is there in the "Imports*:*" section. It's not
great, stuck in a massive block like that, but it's definitely ctrl-F-able
in your browser.

I think this is a potentially serious issue given how RDKit is typically
used. As most usage of objects typically goes through Chem or AllChem,
rather than the specific module that they originate from, most of the time
there will be no real way for users to find that function starting from the
top page of the new documentation. They might get to the Chem or AllChem
page and then hit a dead end. And as Juuso points out, there also doesn't
seem to be a flat list or a search box, so what you're left with is going
to Google.

Regards,
-Rocco

On Tue, Mar 28, 2017 at 11:10 PM, Greg Landrum 
wrote:

> Dear all,
>
> TL;DR
> I'd like to switch to a new system for generating the RDKit Python API
> documentation and I'd like some feedback.
>
> Please take a look at this possible API documentation format:
> http://rdkit.org/docs_temp/
> and let me know if it looks like it looks as useful as the old API doc
> format:
> http://rdkit.org/docs/api/index.html
>
>
> More context:
> The current documentation (http://rdkit.org/docs/api/index.html) is
> generated using epydoc. It's functional, though quite "old school" looking.
> The problem is that epydoc is no longer supported (and hasn't been for
> quite a while) and does not support python3 at all. so I would like to move
> off of it.
>
> In theory the API docs can be generated with Sphinx, which is what I use
> for the rest of the documentation, but I haven't been able to get it
> working correctly with the rdkit.[1]
>
> I've done a bit of looking around and it seems like the closest thing to a
> replacement for epydoc is pdoc (https://github.com/BurntSushi/pdoc). This
> was easy enough to figure out (despite the page hosting its own docs being
> down) and generates documentation for the RDKit API without too much
> trouble. The results (http://rdkit.org/docs_temp/) are certainly more
> modern looking that what epydoc generates and seem to be equally useful.
>
> If anyone has suggestions for other things that I should look at, I would
> be happy to hear them. Constraints there:
> - The system must support extension modules
> - It needs to discovery the things to be documented automatically (i.e. I
> should only have to tell it to document the rdkit module and it figures out
> the rest).
> - Anything that requires changing the actual documentation itself is not a
> viable option.
> - It has to generate HTML
>
>
> Thanks,
> -greg
> [1] The specific problem there is that it seems that sphinx-apidoc does
> not pick up extension modules, which renders the RDKit API docs rather
> sparse and useless. I'd love to find out that this was user error though.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] File Conversion?

2016-12-05 Thread Rocco Moretti
It's not something that RDKit can do - RDKit is focused more on small
organic molecules, rather than biomacromolecules.

For DNA, if all you want is an idealized B-form double helix, there's a
number of programs out there which can take in a sequence and make an ideal
(or almost-ideal) structure from it. From what I can tell 3DNA (
http://x3dna.org/) is one of the more popular ones. There's also webservers
like 3D-DART (http://haddock.science.uu.nl/dna/DARTusage.html) which can do
similar things.

The big caveat here is that unless you explicitly specify how the DNA is
non-standard, these programs will spit out perfectly regular, rod-straight
DNA double helixes. They're not going to be able to adequately model
sequence-specific subtle irregularities and curvature on their own. Frankly
speaking, unless you're looking for a starting model for subsequent
modeling steps (e.g. for input into an MD simulation) these models will
probably not be worth much. If you're just looking at it with e.g. PyMol,
they'll be utterly uninteresting.

Regards,
-Rocco

P.S. Converting protein fastas to structures are a whole 'nother ball of
wax. If you're really interested in that, take a look at the Critical
Assessment of protein Structure Prediction competition (CASP:
http://predictioncenter.org/) for the latest state-of-the-art.

On Sun, Dec 4, 2016 at 9:00 AM, Carl MacGentey 
wrote:

> Dear RDKit Discussion Group-
>
>
>
> Is it possible to convert fasta files (DNA nucleotide sequences) into PDB
> files? I am wanting to view strands of DNA and full length genes in three
> dimensions.
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] comparing two or more tables of molecules

2016-11-28 Thread Rocco Moretti
On Mon, Nov 28, 2016 at 11:31 AM, Christos Kannas 
wrote:

I think it would be better to use a similarity metric based on fingerprints.
>

Hi Christos,

Fingerprints will only work if the fingerprint method you use captures all
of the salient information you're interested in. For example, most
fingerprint metrics in use have spotty or non-existent encoding of
chirality, so if you want to consider two enantiomers to be different,
fingerprint similarity will not work for you. (Unless you happen to pick a
fingerprint method which happens to encode the particular chirality
information you're interested in.)

E.g.

>>> m1 = Chem.MolFromSmiles("CC1=CC[C@](Cl)(CC1)C(=C)C")
> >>> m2 = Chem.MolFromSmiles("CC1=CC[C@@](Cl)(CC1)C(=C)C")
> >>> FingerprintSimilarity(FingerprintMol(m1),FingerprintMol(m2))
> 1.0
>

Even regioisomers can fool a fingerprint-based method, for certain
regioisomers:

>>> m1 = Chem.MolFromSmiles("N(CCC[Br])O")
> >>> m2 = Chem.MolFromSmiles("N(CCCO)[Br]")
> >>> FingerprintSimilarity(FingerprintMol(m1),FingerprintMol(m2))
> 1.0
>

(That's 7 versus 8 carbons on each aliphatic chain.)

I agree with Rajarshi that a SMILES based approach will probably work, if
you make sure you properly canonicalize the SMILES.

The default RDKit SMILES output should work for most molecules. RDKit will
canonicalize the SMILES by default (though keep in mind different programs
have different SMILES canonicalization routines, so only compare RDKit
canonical smiles with other RDKit canonical SMILES). Also, RDKit normally
removes hydrogens on structures it reads in, so passing the molecule
through RDKit will give you a SMILES without (non-critical) hydrogens. By
default it will also output things labeled aromatically, so you don't have
to worry about Kekulization differences.

If you care about stereo-isomer differences, the one thing you probably
will want to change from the defaults is to add "isomericSmiles=True" to
the calls to MolToSmiles(), otherwise you'll lose the chirality information
when you write out your SMILES.

Tautomer and charged forms are going to be the big drawback here.
Especially with things like imidazole-like rings, RDKit can be particular
with hydrogen tautomerization, considering them to be different molecules.

>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1nc(Cl)cn1"))
>
# Doesn't work: Sanitization error
>
>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1[nH]c(Cl)cn1"))
>
'Clc1cnc[nH]1'
>
>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1nc(Cl)c[nH]1"))
>
'Clc1c[nH]cn1'
>
>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1[nH]c(Cl)c[nH+]1"))
>
'Clc1c[nH+]c[nH]1'
>
>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1[nH+]c(Cl)c[nH]1"))
>
'Clc1c[nH]c[nH+]1'
>

That difference stays even after attempting to remove hydrogens from the
molecule.

Regards,
-Rocco
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] ASCII art drawing

2016-08-30 Thread Rocco Moretti
Hello,

Has anyone played around with getting RDKit to draw molecules with ASCII
art? That is, produce a (small-ish) multi-line string which shows the
structure of the molecule? I'm thinking something along the lines of

>>> m = Chem.MolFromSmiles("[OH]c1c1C(=O)[OH]")
>>> print Draw.MolToASCIIArt(m)
OH
   /
  C --C   O
 //\\//
C   C---C
 \  __ / \
  C --C OH
>>>

I know that OpenBabel has something like this, but I was wondering if RDKit
could do something similar.

Thanks,
-Rocco

P.S. Strict ASCII isn't needed - Unicode could work, just as long as it
works in most "normal" terminals.
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] library name change?

2016-08-19 Thread Rocco Moretti
On Fri, Aug 19, 2016 at 6:15 AM, Paul Emsley 
wrote:

>
> It seems to me that several RDKit library names are too generic (and
> hence confusing) for such an environment: I have in mind libs such as
> Alignment, Catalog,
> FileParsers (and others).  I suggest that all RDKit libraries are prefixed
> with RD (like
> RDGeneral and RDInchi).
>

To clarify, would this only be for the shared object files?

This wouldn't involve any changes at the Python or C++ code level, right?
(Only changes would be at the build level.)
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] getting started in C++

2016-05-09 Thread Rocco Moretti
On Mon, May 9, 2016 at 6:20 AM, Rafal Roszak  wrote:

>
> Is there any how-to, tutorial or other such documentation for RDKit in C++?
>

In my experience, a decent way to approach RDKit on the C++ level is to
figure out how to do what you want on the Python level, and then transfer
that to the C++ calls. Most of the Python objects/functions are just
wrappers around the C++ level calls, so a simple case change (and some
searching for the appropriate header) is often all that's needed. Sometimes
the conversion is more complex, but in those cases the code to do the
Python functionality is typically in utility functions in the Wrap/
directories, and is normally straightforward to follow/copy.

Regards,
-Rocco
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problem with kekulization of molecule

2016-02-08 Thread Rocco Moretti
Another thing to be aware of is that if there are multiple valid
kekulizations that differ only in implicit hydrogen layout, RDKit will
refuse to pick one.

e.g. imidazole: "c1cncn1" will give you a "Can't kekulize mol" error,
whereas "c1c[nH]cn1" and "c1cnc[nH]1" are sanitized without a problem.

On Mon, Feb 8, 2016 at 6:43 AM, Greg Landrum  wrote:

> Expanding a very small amount on Paolo's answer:
>
> The general rule of thumb is that you should be able to draw a valid
> conjugated Kekule structure for the molecule where the ring has 4N+2
> electrons.
> That works for C1=[S+]SC=N1, which produces the output SMILES c1nc[s+]s1.
>
> -greg
>
>
> On Mon, Feb 8, 2016 at 12:25 PM, Paolo Tosco  wrote:
>
>> Dear Guido,
>>
>> to be aromatic, that ring system will need a +1 formal charge on one of
>> the sulfur atoms:
>>
>> a = Chem.MolFromSmiles("c1ncs[s+]1",sanitize=True)
>>
>> or
>>
>> a = Chem.MolFromSmiles("c1nc[s+]s1",sanitize=True)
>>
>> Best regards,
>> Paolo
>>
>>
>> On 08/02/2016 11:19, Wolf-Guido Bolick wrote:
>>
>> Hi all,
>> I recently stumbled over a problem with the kekulization of this
>> structure c1ncss1 .
>>
>> a = Chem.MolFromSmiles("c1ncss1",sanitize=True)
>> [11:56:21] Can't kekulize mol
>>
>> Is it possible to create a sanitized mol-object of this structure?
>>
>> kind regards,
>> Guido
>>
>>
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup 
>> Now!http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>>
>> --
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Reactions and User-set properties

2016-01-04 Thread Rocco Moretti
Hello,

It doesn't look like user-set properties are copied through with explicitly
mapped atoms in chemical reactions, although implicitly copied atoms do
have their properties transfered. (See example session below.)

Is this the intended behavior? If I were to want all mappable atoms
(implicit & explicit) to have particular properties transfered across the
reaction, how best would I go about doing so (specifically on the C++
level)?

Thanks,
-Rocco

--

from rdkit import Chem
from rdkit.Chem import AllChem

def print_arb_prop(mollist):
for m in range(len(mollist)):
for a in range(mollist[m].GetNumAtoms()):
atom = mollist[m].GetAtomWithIdx(a)
if atom.HasProp("arb_prop"):
print m, a, atom.GetSymbol(), atom.GetIntProp("arb_prop")

mol = Chem.MolFromSmiles("ClCNOCBr")
for a in range(mol.GetNumAtoms()):
mol.GetAtomWithIdx(a).SetIntProp("arb_prop", a*-2 )

print_arb_prop( [mol] )
# 0 0 Cl 0
# 0 1 C -2
# 0 2 N -4
# 0 3 O -6
# 0 4 C -8
# 0 5 Br -10

rxn1 = AllChem.ReactionFromSmarts('[N:3][O:4]>>[N:3].[O:4]')
print_arb_prop( rxn1.RunReactants([mol])[0] )
# 0 1 C -2
# 0 2 Cl 0
# 1 1 C -8
# 1 2 Br -10

rxn2 =
AllChem.ReactionFromSmarts('[Cl:1][C:2][N:3][O:4][C:5]>>[Cl:1][C:2][N:3].[O:4][C:5]')
print_arb_prop( rxn2.RunReactants([mol])[0] )
# 1 2 Br -10
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Atom-Atom mapping between two molecules

2015-10-18 Thread Rocco Moretti
Oops. Turns out that the issue I had with the found substructure SMARTS not
matching the original molecules was a rather bone-headed bug in my code --
I was accidentally swapping the molecules with respect to the function
signature. Sorry for the false alarm.

I'm still curious if there is an easier way to match up atoms in two
structures, though.

Thanks,
-Rocco


On Sat, Oct 17, 2015 at 5:18 PM, Rocco Moretti <rmoretti...@gmail.com>
wrote:

> Hello,
>
> I'm trying to find a mapping between two molecules, specifically matching
> up the corresponding atoms. I've had success with SubstructMatch, if one of
> the molecules is a proper substructure of the other, but that doesn't
> generalize well.
>
> My initial attempts at a general solution have been with the findMCS
> function, using the resultant SMARTS string to generate a query molecule,
> then using SubstructMatch to find each molecules mapping to the query
> molecule, and converting those results to a direct molecule-molecule
> mapping.
>
> Not only does it seem rather circuitous, I've run into situations where
> findMCS gives back a reasonable SMARTS string, but the SubstructMatch steps
> don't find matches to the original molecules. It's probably some subtle
> atom/bond property issue, as after bouncing the molecules through SMILES
> strings I can get matches. (Which is to say I don't have an isolatable case
> I could share.)
>
> Is there an easier way of mapping atoms between two molecules with RDKit?
> (I'm doing this on the C++ level, and am not adverse to some light patching
> of the RDKit source code.)
>
> Thanks,
> -Rocco
>
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Atom-Atom mapping between two molecules

2015-10-17 Thread Rocco Moretti
Hello,

I'm trying to find a mapping between two molecules, specifically matching
up the corresponding atoms. I've had success with SubstructMatch, if one of
the molecules is a proper substructure of the other, but that doesn't
generalize well.

My initial attempts at a general solution have been with the findMCS
function, using the resultant SMARTS string to generate a query molecule,
then using SubstructMatch to find each molecules mapping to the query
molecule, and converting those results to a direct molecule-molecule
mapping.

Not only does it seem rather circuitous, I've run into situations where
findMCS gives back a reasonable SMARTS string, but the SubstructMatch steps
don't find matches to the original molecules. It's probably some subtle
atom/bond property issue, as after bouncing the molecules through SMILES
strings I can get matches. (Which is to say I don't have an isolatable case
I could share.)

Is there an easier way of mapping atoms between two molecules with RDKit?
(I'm doing this on the C++ level, and am not adverse to some light patching
of the RDKit source code.)

Thanks,
-Rocco
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Issues with molfiles, aromaticity and matching

2015-09-03 Thread Rocco Moretti
Hello,

I'm seeing unexpected results when trying to match a search query encoded
as an MDL Molfile. It looks like I'm not getting any matches when the
oxygens of a quinone are replaced with placeholder atoms in an otherwise
identical structure.

That is, if I take the molfile for quinone, copy it and only change the 'O'
atoms to '*' atoms, the query doesn't work, possibly due to aromaticity
issues:

>>> from rdkit import Chem
>>> print rdkit.__version__
2015.03.1
>>> m = Chem.MolFromMolFile("quinone_test.sdf")
>>> q = Chem.MolFromMolFile("quinone_stub.sdf")
>>> m.HasSubstructMatch(q)
False
>>> Chem.MolToSmiles(m)
'O=C1C=CC(=O)C=C1'
>>> Chem.MolToSmiles(q)
'[*]=c1ccc(=[*])cc1'
>>> Chem.MolToSmarts(m)
'[#8]=[#6]1-[#6]=[#6]-[#6](-[#6]=[#6]-1)=[#8]'
>>> Chem.MolToSmarts(q)
'*=[#6]1:[#6]:[#6]:[#6](:[#6]:[#6]:1)=*'

Note I still have issues even if I load the query as a SMILES string:

>>> q2 = Chem.MolFromSmiles("[*]=C1-C=C-C(=[*])-C=C1")
>>> m.HasSubstructMatch(q2)
False
>>> Chem.MolToSmiles(q2)
'[*]=c1ccc(=[*])cc1'

But not when I load it as a SMARTS string:

>>> q3 = Chem.MolFromSmarts("[*]=C1-C=C-C(=[*])-C=C1")
>>> m.HasSubstructMatch(q3)
True
>>> Chem.MolToSmiles(q3)
'[*]=C1C=CC(=[*])C=C1'

As using SMARTS strings is not really feasible for what I'm doing, is there
something I'm doing wrong with respect to loading query molecules from
Molfiles? The structure is already single/double Kekulized in the molfile,
so is there some flag or other loading function I should be using to avoid
spurious aromatization? (Hopefully, one that's general enough that I won't
have issues when loading and matching truly aromatic molecules.)

Thanks,
-Rocco

P.S. My end usage will actually be using the C++ API, if that makes a
difference for recommendations.



## quinone_test.sdf, for completeness (quinone_stub.sdf is identical,
except for "*" instead of the two "O"):

quinone
comment 1
comment 2
 12 12  0  0  0  0  0  0  0  0999 V2000
1.0263   -0.0278   -0.3487 O   0  0  0  0  0  0  0  0  0  0  0  0
2.2087   -0.0217   -0.0369 C   0  0  0  0  0  0  0  0  0  0  0  0
2.94461.24280.1576 C   0  0  0  0  0  0  0  0  0  0  0  0
4.23731.24900.4999 C   0  0  0  0  0  0  0  0  0  0  0  0
4.9841   -0.00930.6981 C   0  0  0  0  0  0  0  0  0  0  0  0
6.1658   -0.00351.0123 O   0  0  0  0  0  0  0  0  0  0  0  0
4.2483   -1.27410.5019 C   0  0  0  0  0  0  0  0  0  0  0  0
2.9564   -1.28010.1598 C   0  0  0  0  0  0  0  0  0  0  0  0
2.38262.15660.0087 H   0  0  0  0  0  0  0  0  0  0  0  0
4.79142.16780.6465 H   0  0  0  0  0  0  0  0  0  0  0  0
4.8110   -2.18780.6502 H   0  0  0  0  0  0  0  0  0  0  0  0
2.4019   -2.19920.0122 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  8  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  3  9  1  0  0  0  0
  4  5  1  0  0  0  0
  4 10  1  0  0  0  0
  5  7  1  0  0  0  0
  5  6  2  0  0  0  0
  7  8  2  0  0  0  0
  7 11  1  0  0  0  0
  8 12  1  0  0  0  0
M  END

--
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss