Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Peter S. Shenkin
Well, if I'm recalling correctly, a highly symmetric structure like
buckminsterfullerene takes a long time to canonicalize.

I don't know what the formal definition of a planar graph is, but I would
guess it's not what chemists mean when they say a molecule is planar.

-P.

On Thu, Jun 15, 2023 at 2:51 PM S Joshua Swamidass 
wrote:

> Incidentally,
>
> I came across this O(log N) canonization algorithm for planar graphs:
> https://arxiv.org/pdf/0809.2319.pdf
>
> I wonder if this algorithm can be adapted for chemistry? Molecules are
> usually planar, but I believe they occasionally are "nearly" planar, by
> which I mean planar graphs plus a few edges that break the planarity.
>
> And what (generally speaking) is the algorithm used by rdkit? Do we know
> it's complexity?
>
>
>
>
> On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass 
> wrote:
>
>> Andrew,
>>
>> Thanks! According to wikipedia (and my recollections of algorithms
>> class)...
>> "The problem is not known to be solvable in polynomial time
>>  nor to be NP-complete
>> , and therefore may be in the
>> computational complexity class
>>  NP-intermediate
>> ."
>> https://en.wikipedia.org/wiki/Graph_isomorphism_problem
>>
>> Your reference though is really helpful. The key phrase seems to be
>> "bounded valence" which is certainly true of molecular graphs. Each atom
>> can only bound some fairly low number of other atoms, i.e. bounded valence.
>> That's probably the reason why we do have a polynomial time algorithm...
>>
>> Thank you!
>>
>> Joshua
>>
>> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke 
>> wrote:
>>
>>> On Jun 15, 2023, at 18:20, S Joshua Swamidass 
>>> wrote:
>>> > It's well known that the graph-isomorphism problem is NP
>>>
>>> While P is contained in NP, I don't think that's the NP you mean.
>>>
>>> I suspect you may be thinking of subgraph isomorphism, which is NP-hard.
>>> Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished)
>>> claim is correct.
>>>
>>> Also, "Isomorphism of graphs of bounded valence can be tested in
>>> polynomial time" -
>>> https://www.sciencedirect.com/science/article/pii/002282900095 .
>>>
>>>
>>> > So here is my question. What are the cases that are very difficult to
>>> canonize a graph?
>>>
>>> As I recall, handling chirality and other non-local properties is
>>> difficult. I have not worked on this problem.
>>>
>>> Cheers,
>>>
>>> Andrew
>>> da...@dalkescientific.com
>>>
>>>
>>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] hybridization of nitrogen in beta-lactam

2021-02-13 Thread Peter S. Shenkin
Amide Ns are usually viewed as sp2 because of the resonance RC(=O)-NR2 <->
RC([O-])=[N+]R2, where R can be H.

Unlike sp3 Ns (amines), amides are not strong H-bond acceptors, though both
amides and amines are strong donors. This observation is consistent with
sp2 character.

-P.

On Sat, Feb 13, 2021 at 1:18 PM Peter St. John 
wrote:

> Is there any reason why RDKit says the nitrogen in beta-lactam is
> SP2-hybridized? I would have assumed it should be SP3. It doesn't seem to
> be the ring structure, 'C1NC1' lists all the atoms as being SP3.
>
>
>
> >>> [(atom.GetSymbol(), atom.GetHybridization()) for atom in
>  rdkit.Chem.MolFromSmiles('O=C1CCN1').GetAtoms()]
>
> [('O', rdkit.Chem.rdchem.HybridizationType.SP2),
>  ('C', rdkit.Chem.rdchem.HybridizationType.SP2),
>  ('C', rdkit.Chem.rdchem.HybridizationType.SP3),
>  ('C', rdkit.Chem.rdchem.HybridizationType.SP3),
>  ('N', rdkit.Chem.rdchem.HybridizationType.SP2)]
>
>
> Thanks!
> -- Peter
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-22 Thread Peter S. Shenkin


On Fri, Jan 22, 2021 at 9:01 AM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> I think there was some confusion between left and right in the original
> message, but RDKit prefers the representation that preserves the octet at
> the expense of having more formal charges:
>
> In [9]: mol = Chem.MolFromSmiles('O=I(=O)([O-])')
>
>
> In [10]: Chem.MolToSmiles(mol)
>
> Out[10]: '[O-][I+2]([O-])[O-]'
>
> I don't think there's right and wrong here; it's just a matter of a
> toolkit picking a canonical convention.
>
> Carboxylates are different in that the popular representation (C(=O)[O-])
> doesn't break the octet rule. But another interesting case is nitro groups:
>
> In [11]: mol = Chem.MolFromSmiles('CN(=O)=O')
>
>
> In [12]: Chem.MolToSmiles(mol)
>
> Out[12]: 'C[N+](=O)[O-]'
>
>
> Ivan
>
> On Thu, Jan 21, 2021 at 9:06 PM Peter S. Shenkin 
> wrote:
>
>> It seems to me offhand RDKit's choice is analogous the way carboxylates
>> are generally notated:
>>
>> R-C(=O)O- rather than R-C+(O-)O- .
>>
>> Both are legitimate and in fact equivalent upon application of
>> chemical knowledge, but do you prefer the second representation for
>> carboxylates?
>>
>> -P.
>>
>>
>>
>>
>>
>> On Thu, Jan 21, 2021 at 2:06 PM Jason Biggs 
>> wrote:
>>
>>> The RDKit will always convert iodate from the form on the left, with an
>>> expanded octet on iodine and a single negative charge, into the form on the
>>> right with all single bonds and a charge on every atom (image here
>>> https://i.stack.imgur.com/hq3St.png).  This happens no matter how I
>>> import the molecule, from SMILES or from a file.  The only way to avoid it
>>> is to skip sanitization, which I'd rather avoid.
>>>
>>> Is this the desired behavior?
>>>
>>> Thanks
>>> Jason
>>>
>>> [image: image.png]
>>>
>>>
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-21 Thread Peter S. Shenkin
It seems to me offhand RDKit's choice is analogous the way carboxylates are
generally notated:

R-C(=O)O- rather than R-C+(O-)O- .

Both are legitimate and in fact equivalent upon application of
chemical knowledge, but do you prefer the second representation for
carboxylates?

-P.





On Thu, Jan 21, 2021 at 2:06 PM Jason Biggs  wrote:

> The RDKit will always convert iodate from the form on the left, with an
> expanded octet on iodine and a single negative charge, into the form on the
> right with all single bonds and a charge on every atom (image here
> https://i.stack.imgur.com/hq3St.png).  This happens no matter how I
> import the molecule, from SMILES or from a file.  The only way to avoid it
> is to skip sanitization, which I'd rather avoid.
>
> Is this the desired behavior?
>
> Thanks
> Jason
>
> [image: image.png]
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] From MW to structure

2021-01-07 Thread Peter S. Shenkin
Are you starting with an integral molecular weight or an experimentally
determined value, perhaps even a set of values from mass spec?

If it's an integral value then, if you are willing to settle for known
compounds, it might not be too hard. You could derive a bunch of empirical
formulas consistent with that molecular weight, given a list you would
supply of allowed elements. Then you could simply look up known compounds
with those formulas.

Actually, you could probably just look up the known compounds with the
specified molecular weight.

If it's a high-precision experimentally determined average molecular
weight, you could rule out some empirical formulas because many empirical
formulas would be inconsistent with the elements' known isotopic ratios.

If you are looking at high-res mass spec data, you could be even more
precise.

If you are actually trying to propose structures de-novo, that is a much
harder problem, per the Rdkit discussion thread cited by Nils. It probably
still makes sense to proceed via empirical formulas as an intermediate
result, but that would be a small fraction of the work it would take to get
you there. Going from empirical formulas to all consistent structures is
the hard part, and if you insist on stable structures or readily
synthesizable structures, that makes it harder.

-P.

On Thu, Jan 7, 2021 at 2:06 PM BOURG Stephane 
wrote:

> Dear all,
>
>
>
> I'm looking for a tool that can generate chemical structures from the
> molecular weight of the compound.
>
> Do you know RDKit functions or other tools able to offer that service ?
>
>
>
> Best regards,
>
>
>
>
>
>
>
> Stéphane BOURG, Ph. D.
>
> Equipe Bioinformatique Structurale et Chémoinformatique (SBC)
>
>
>
> Institut de Chimie Organique et Analytique (ICOA)
>
> UMR CNRS-Université d’Orléans 7311
>
> Pôle de Chimie
>
> Rue de Chartres – BP 6759
>
> 45 067 Orléans Cedex 2
>
>
>
> Tél : +33 (0)2 38 49 45 89
>
> E-mail : stephane.bo...@cnrs.fr
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-11-27 Thread Peter S. Shenkin
Yes, I've seen the same phenomenon in multiple SMILES generators.

Even Daylight's (when they had it up on a public web site).

>From a chemical perspective, it isn't sensible that the pyridone-like ring
in molecule 1 should not be seen as aromatic in the canonical
SMILES, especially since the same ring is seen as aromatic in molecule2.
The counter-argument has often been that "only exocyclic substituents are
considered". But of course that =N is indeed exocyclic to the ring in
question.

In a famous quote, Dave Weininger said:

It is important to remember that the purpose of the SMILES aromaticity
detection algorithm is for the purposes of chemical information
representation only! To this end, rigorous rules are provided for
determining the "aromaticity" of charged, heterocyclic, and electron
deficient ring systems. The"aromaticity" designation as used here is not
intended to imply anything about the reactivity, magnetic resonance
spectra, heat of formation, or odor of substances.

As an example of the utility of this definition, consider o-xylene. You
don't want to see the VB structure with a double bond connecting the
methyl-attached carbons as different from the form with a single bond in
that position. Hence, aromaticity enables SMILES to avoid that issue, since
the (canonical) SMILES does not contain any double bonds, but only aromatic
bonds within the ring.

And the fact is that there is no ambiguity in any of the structures I've
seen (including the one shown SMILES1) that exhibit the problem. There's
only one way to draw the resonance structure, anyway, so you could argue
that you don't need to make it aromatic at all.

Of course, if you had the courage of that particular conviction, you
wouldn't bother making pyrrole aromatic, either, because there's only one
resonance structure you can draw. But SMILES does define pyrrole as
aromatic.

When I've discussed this with developers who have worked on SMILES systems,
they say that looking for cases like exocyclic aromaticity-producing
substituents in adjacent non-aromatic rings would slow the SMILES generator
down.

But the problem is that when you are using a SMARTS to look for one of
these pyridone-like rings that you see in the first structure, you're not
going to find it, even though it's there. Chemists do expect an aromatic
SMARTS to find an aromatic ring, which is no doubt the secret reason for
making pyrrole aromatic.

I've never liked this situation, but it boils down to the fact that
Daylight, which produced the original reference SMILES implementation,
"done it that-a-way". It has the advantage of *stare decisis*.

-P.

P. S. By the way, if any of you have ever seen a SMILES generator that
displays the 6-membered ring as aromatic in the first example, could you
please tell us which one that is?

On Fri, Nov 27, 2020 at 1:55 PM Paolo Tosco 
wrote:

> (Now with link - you can tell it's Friday night)
>
> Hi Mark, Alexis,
>
> Yes, I was too fast in composing my previous reply and I did not pay
> enough attention to the molecules.
> After reading Alexis' reply, I looked more carefully at his original
> question and at that point I remembered having seen a similar behaviour
> before from RDKit on condensed ring systems featuring exocyclic bonds and
> relative mailing list discussions.
> So I did a bit of searching and I fished out the (long) thread that deals
> with exactly this behaviour.
>
>
> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAAsqebGxOwJtH32T5jC%3DoBZN6G1JE_NwsEqKUO8%2BmUCqmABCzQ%40mail.gmail.com/#msg36448625
>
> I hope that helps, cheers
> p.
>
> On Fri, Nov 27, 2020 at 7:31 PM Mark Mackey 
> wrote:
>
>> Hi Paolo,
>>
>>
>>
>> Hmmm, I think this is displaying a bug (or at the very least unexpected
>> behaviour) in the aromaticity code. The issue isn’t the aromaticity of the
>> imidazole/dihydroimidazole, but the aromaticity of the pyridyl. Alexis’
>> second molecule is identical to the first except that one bond in the
>> 5-membered ring was broken, and that (to my eyes at least) should not
>> affect whether the 6-membered ring is seen as aromatic.
>>
>>
>>
>> Regards,
>>
>> Mark.
>>
>>
>>
>> *From:* Paolo Tosco 
>> *Sent:* 27 November 2020 17:04
>> *To:* Alexis Parenty 
>> *Cc:* RDKit Discuss 
>> *Subject:* Re: [Rdkit-discuss] canonicalization of two aromatic
>> molecules returning two different forms (kekule and aromatic)
>>
>>
>>
>> Hi Alexis,
>>
>>
>>
>> The second molecule (smiles2) is indeed aromatic, but the first (smiles1)
>> is not, as the imidazole ring condensed to the pyridine is partially
>> saturated.
>>
>> The smiles1a analogue where I have added a double bond is aromatic, and
>> upon canonicalization it yields an aromatic SMILES as expected.
>>
>>
>>
>> Cheers,
>>
>> p.
>>
>>
>>
>> *from* rdkit *import* Chem
>>
>> In [2]:
>>
>> mol1 *=* Chem*.*MolFromSmiles("N12C=CC=CC1=NCC2")
>>
>> In [3]:
>>
>> mol1
>>
>> Out[3]:
>>
>> In [4]:
>>
>> smiles1 *=* Chem*.*MolToSmiles(mol1)
>>
>> In [5]:
>>
>> smiles1
>>

Re: [Rdkit-discuss] [EXTERNAL] Re: Morgan FP atom numbering

2020-10-28 Thread Peter S. Shenkin
I found that on the NY Public Library web site, the book is available,
chapter by chapter, as a digital download, if you have a library card. The
host site is at John’s-Hopkins, so check your local library system, which
might also supply access.

-P.

On Wed, Oct 28, 2020 at 12:08 PM Cyrus Maher  wrote:

> Hi Andrew,
>
> Thank you! This is so thorough, and so helpful. We truly appreciate it.
>
> All the best,
>
> -Cyrus
>
> On 10/27/20, 4:28 AM, "Andrew Dalke"  wrote:
>
> ** EXTERNAL EMAIL **
>
>
> On Oct 26, 2020, at 17:41, Cyrus Maher  wrote:
> > I’m wondering if there is an easy way to retrieve the atom numbers
> that the morgan fingerprints algorithm assigns as its first step.
>
> Many of the fingerprint function support an optional "bitInfo"
> parameter. If it's a dictionary then the keys are the bit that was set, and
> the value is at tuple of the (atom index, radius) which set it.
>
> Here's an example with theobromine using r=0, which lets you see the
> initial invariants:
>
> >>> from rdkit import Chem
> >>> from rdkit.Chem import rdMolDescriptors
> >>> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
> >>> bitInfo = {}
> >>> fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(mol, radius=0,
> useFeatures=1, bitInfo=bitInfo)
> >>> for bitno, pairs in sorted(bitInfo.items()):
> ...   print(f"Bitno: {bitno}")
> ...   for atom_idx, r in pairs:
> ... print(f"  atom {atom_idx}
> ({mol.GetAtomWithIdx(atom_idx).GetSymbol()}) with radius {r}")
> ...
> Bitno: 0
>   atom 0 (C) with radius 0
>   atom 12 (C) with radius 0
> Bitno: 2
>   atom 7 (O) with radius 0
>   atom 10 (O) with radius 0
> Bitno: 4
>   atom 2 (C) with radius 0
>   atom 4 (C) with radius 0
>   atom 5 (C) with radius 0
>   atom 6 (C) with radius 0
>   atom 9 (C) with radius 0
> Bitno: 5
>   atom 8 (N) with radius 0
> Bitno: 6
>   atom 1 (N) with radius 0
>   atom 3 (N) with radius 0
>   atom 11 (N) with radius 0
>
> If I follow the code correctly, when useFeatures == 1 then the intial
> invariants are set by getFeatureInvariants() in
> ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp , available at:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L221=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=tfPrxiPHW2FK-NXmObtRK0ri4Z456d1IlSiKx1tIB9s=
>
> A few lines up, at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_d594998dda2803a15aa0066e06f2477b71ed3be6_Code_GraphMol_Fingerprints_FingerprintUtil.cpp-23L182=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=b_UdO5RJBZB-KGEyd1F-0g=3NVDgW4zQTXqvGgBgN1WpWt1PuJsSWJpydkDuamTkGY=rd8o6LjWxXd6iezueStsEXFPmvKD2IoPWRz_vCOnPNI=
> , you'll see the features patterns defined in smartsPatterns
>
> They appear to be identical to the list you gave.
>
> I reimplemented the initialization function (copied at the end of this
> email). Running the program shows that it produces the same invariants
> which are used as the bit numbers in the Morgan feature fingerprint:
>
> Invariant: 0
>   atom 0 (C)
>   atom 12 (C)
> Invariant: 2
>   atom 7 (O)
>   atom 10 (O)
> Invariant: 4
>   atom 2 (C)
>   atom 4 (C)
>   atom 5 (C)
>   atom 6 (C)
>   atom 9 (C)
> Invariant: 5
>   atom 8 (N)
> Invariant: 6
>   atom 1 (N)
>   atom 3 (N)
>   atom 11 (N)
>
>
> I believe that gives you two ways to get the information you want!
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> # Python re-implementation of RDKit's getFeatureInvariants() from
> # ./Code/GraphMol/Fingerprints/FingerprintUtil.cpp
>
> from rdkit import Chem
>
> smartsPatterns = [
> "[$([N;!H0;v3,v4&+1]),\
> $([O,S;H1;+0]),\
> n&+0]",  # Donor
> "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\
> $([O,S;H0;v2]),\
> $([O,S;-]),\
> $([N;v3;!$(N-*=[O,N,P,S])]),\
> n&+0,\
> $([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]",# Acceptor
> "[a]",  # Aromatic
> "[F,Cl,Br,I]",  # Halogen
> "[#7;+,\
> $([N;H2&+0][$([C,a]);!$([C,a](=O))]),\
> $([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\
> $([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]",  # Basic
> "[$([C,S](=[O,S,P])-[O;H1,-1])]"# Acidic
> ]
>
> mol = Chem.MolFromSmiles("Cn1cnc2c1c(=O)[nH]c(=O)n2C")
> invariants = [0] * mol.GetNumAtoms()
> for pattern_idx, smartsPattern in 

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-25 Thread Peter S. Shenkin
Canonical SMILES is probably the way to go, but you might also be able to
use the InchiKey and the Inchi auxiliary information together as a compound
hash key.

-P.

On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI  wrote:

> Hi Gustavo,
>
>
> (Sorry, forgot to reply all before...)
>
>
> Your deduplication task is quite familiar to me and something I do quite a
> lot of in my own work ;)
>
>
> Can I suggest deduplicating using Canonical SMILES?
>
>
> It doesn't solve your InChIKey issue, but it is a solution for now.
>
>
> I updated my gist to show that it is feasible:
>
>
> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>
>
> 
>
> Adelene
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Sunday, October 25, 2020 2:27:15 PM
> *To:* Adelene LAI
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Actually,  I was trying to generate all stereoisomers for molecules in a
> database,  and filter duplicate molecules by using the InChI Key to detect
> duplicates.  But it gives cis/trans isomers on sp2-N the same Key.
>
> Gustavo.
>
> --
> Gustavo Seabra
>
> --
> *From:* Adelene LAI 
> *Sent:* Sunday, October 25, 2020 1:44:01 AM
> *To:* Gustavo Seabra 
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
>
> Hi Gustavo,
>
>
> It occurred to me while swimming yesterday - was there a reason you
> pointed out the hybridisation state of N in your original subject text?
>
>
> Was it just to specify which N to focus on, or did you expect something
> special about sp2 hybridisation wrt InChIKey?
>
>
> Adelene
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Saturday, October 24, 2020 5:37:09 AM
> *To:* RDKit Discuss; Adelene LAI
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Thanks for looking into it. I'm happy to see.it wasn't just a mistake by
> me ;-)
>
> I hope we can find what's wrong there.
>
> Best,
> Gustavo.
>
> --
> Gustavo Seabra
>
> --
> *From:* Adelene LAI 
> *Sent:* Friday, October 23, 2020 11:28:55 PM
> *To:* Gustavo Seabra ; RDKit Discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
>
> Hi Gustavo,
>
>
> 
> https://gist.github.com/adelenelai/59a8794e1f030941c19bcb50aa8adf3f
>
>
> In the gist above, I tried doing some further investigating.
>
>
> It seems for the example you gave, the rdkit functions indeed give the
> same inchikey and inchi, but different aux info.
>
>
> Why this different aux info doesn't translate into different
> inchikeys/inchis, I'm not sure.
>
>
> Adelene
>
>
>
>
>
>
> Doctoral Researcher
>
> Environmental Cheminformatics
>
> UNIVERSITÉ DU LUXEMBOURG
>
>
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>
> 6, avenue du Swing, L-4367 Belvaux
>
> T +356 46 66 44 67 18
>
> [image: github.png] adelenelai
>
>
>
>
>
> --
> *From:* Gustavo Seabra 
> *Sent:* Friday, October 23, 2020 6:43:07 PM
> *To:* RDKit Discuss
> *Subject:* [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key
>
> Hi all,
>
> I run into an issue here, and I'd appreciate your input. I noticed that
> compounds that differ only on the cis-trans isomerization around an sp2
> nitrogen get the same InChI Key from RDKit. For example:
>
> > inchi_cis =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(/NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_cis
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>
> > inchi_trans =
> Chem.inchi.MolToInchiKey(Chem.MolFromSmiles("C/N=C(\\NC#N)NCCSCc1nc[nH]c1C"))
> > inchi_trans
> 'AQIXAKUUQRKLND-UHFFFAOYSA-N'
>
> > inchi_cis == inchi_trans
> True
>
> I wonder if this is a limitation of the InChI Key definition, or an
> implementation issue.
>
> Thanks a lot,
> --
> Gustavo Seabra.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] A question of molecule structure

2020-09-20 Thread Peter S. Shenkin
It could involve either a tautomeric solution or a zwitterionic solution.
But it is not clear to me why the current structure needs to be altered.
After all, pyridones are most commonly written as shown.

-P.

On Sun, Sep 20, 2020 at 12:19 AM Markus Metz  wrote:

> Dear Gao:
> Your question is a tautomer related issue.
> May be this might help you:
> https://rdkit.blogspot.com/2020/01/trying-out-new-tautomer.html
> https://github.com/rdkit/rdkit/issues/2908
> Best,
> Markus
>
>
> On Sep 19, 2020, at 9:03 AM, Gao, Zhenxiang 
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hi folks,
>
>
>
> I have a simple question. Two double bonds in the following molecule are
> outsides the two rings. Does Rdkit has some functions to move two double
> bonds back to the rings?
>
>
>
> SMILES : Cc1ccc(=NC(=O)N=c2[nH]c(C)cn2C)[nH]c1
>
> 
>
>
>
> Thanks,
>
> Jason Gao
>
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ___
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecular weight function

2020-04-08 Thread Peter S. Shenkin
It is probably best to say that it is the sum of atomic weights for the
atoms in the molecule, where each element gets an atomic weight computed by
summing the products of its isotope atomic weights with the natural
fractional abundance of the isotope.

For some elements, this is not terribly well defined, because for those
elements, the isotopic composition varies considerably with origin.

-P.

On Wed, Apr 8, 2020 at 4:35 PM Navid Shervani-Tabar 
wrote:

> Hello,
>
> I was wondering if the word "average" in the following function and
> description:
>
> rdkit.Chem.Descriptors.MolWt(**x*, ***y*)
> 
>
> The average molecular weight of the molecule
>
> is referring to average over all possible molecule with different isotopes
> of the atoms in the molecule (including hydrogen).
>
> Thanks,
> Navid
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and GSoC 2020

2020-03-06 Thread Peter S. Shenkin
"Cells in columns named SMILES, or have SMILES as a substring in the
header, will be depicted in 2D using RDKit"

Sounds like a great project, but I think the above can be improved upon as
a specification. In many or even most situations, users will want to be
able to view the SMILES as a string and simultaneously visualize the
structures. Also, tying behavior of a column exclusively to the column
title is not (as far as I know) a standard Gsheets paradigm. Or at least I
hope not. 

I don't know know whether Gsheets provides a facility for addition of new
formulas, but if this is allowed, it might be reasonable to remove the
title dependence by introducing a formula with a name like "RDDEPICT". So
if a SMILES appears as text in cell A1, filling A2 with the depiction would
be accomplished by setting A2 to "=RDDEPICT(A1)". This would presumably
also automatically update if A1 changes using Gsheets' built-in handling of
dependencies.

If addition of formulas is not possible, it is probably possible to
accomplish the the same thing using the Gsheets Javascript API.

Either way, I think something like this would be better than tying the
behavior to the column title, even if the latter should be possible.

-P.

On Fri, Mar 6, 2020 at 8:24 PM JW Feng  wrote:

> Project suggestion:
>
> Project 1:
> Implement 2D structure depiction in Google Spreadsheets.  My colleagues at
> Google think this is very doable.  Being able to depict structures in
> Google Spreadsheets will dramatically increase collaboration between
> scientists.  Imaging being able to provide comments for a structure, design
> idea, or virtual screening hit in a live Google Spreadsheet.  While there
> are commercial (Vortex, Spotfire, MarvinView, Stardrop ...) and open source
> (Datawarrior) packages that can read CSV files containing smiles and depict
> structures, none comes close to GSheets for collaboration and ease of use.
>
>- Cells in columns named SMILES, or have SMILES as a substring in the
>header, will be depicted in 2D using RDKit
>- Cells with depicted structures move with other columns when sorting,
>filtering, etc.
>- Optional: depictions update when SMILES string is edited
>- Bonus: calculate properties using formulas.  Ex: Descriptors.MolWt(A1)
>calculates MW of SMILES in A1
>
> Project 2:
>
>- Make it easy to use RDKit in Google Colab
>
>- No need to install RDKit, from rdkit import Chem just works out of
>the box
>
> Best,
>
> JW
> On Sun, Feb 23, 2020 at 11:48 PM Greg Landrum 
> wrote:
>
>> Dear all,
>>
>> I'm happy to share that the RDKit will once again be part of Google
>> Summer of Code in 2020. This is a program where Google funds students to
>> work on open-source projects for a couple of months over the summer. We've
>> participated in each of the last three years and had some cool stuff come
>> out of it.
>>
>> We're looking for a few more project ideas (along with possible mentors!)
>> as well as students.
>> Applications start in the middle of March. There's more info about
>> timelines here:
>> https://developers.google.com/open-source/gsoc/timeline
>>
>> The current set of project ideas is here and we could use a few more:
>> http://wiki.openchemistry.org/GSoC_Ideas_2020#RDKit_Project_Ideas
>> I'm going to try and come up with something, but if you have something to
>> add, please let me know.
>>
>> Best,
>> -greg
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-22 Thread Peter S. Shenkin
Hi,

I still believe that Acepentalene should not be recognized by RDKit as
aromatic, because there is no ring that contains 4n+2 electrons. The fact
that counting bonds not in the outer ring gives 10 electrons should not
make the outer ring aromatic. Moreover, RDKit seems to perceive aromaticity
correctly (using this criterion) in several similar systems which have 4n+2
electrons in the outer ring but in which counting additional electrons in
bonds not in the ring would, following Greg's interpretation, make them
non-aromatic.

Here are several examples, starting with a recap of Acepentalene.

For Acepentalene itself, as Greg pointed out, there are 9 electrons in the
outer ring, but since no single ring contains 4n+2 electrons, I do not
believe it should be considered aromatic.

[image: acepentalene.png]

---
Now consider the closely related compound created by making one of those
5-membered rings a 7-membered ring. It's called aceazulylene (yes, I had to
look this up! ). The outer ring is aromatic in my view, because it has 10
(i.e., 4n+2) electrons. The ring system has 12 electrons, and so it would
seem that based on Greg's discusiion of acepentalene, it should be
perceived as non-aromatic. Yet RDKit perceives it as aromatic — correctly,
in my view. As a side issue, I would have thought that, as in azulene, the
internal bonds and the central carbon would not have been perceived as
aromatic by RDKit; this is the same issue that Andrew originally raised for
Acepentalene.

[image: aceazulylene.png]

---
Now consider dicyclopenta[cd,gh]pentalene from Schleyer's paper (referenced
in Andrew Dalke's recent email). Again, this molecule has 12 electrons in
total, so that again, based on Greg's discussion of acepentalene, I'd have
thought RDKit would consider it non-aromatic. But the outer ring consists
of a pi system containing 4n+2 electrons, and so, in my view, it should be
considered aromatic. Schleyer's calculations agree. And again, as in
aceazulylene, RDKit in fact correctly perceives it as aromatic, although,
as in aceazulylene,, the internal bonds and carbons should probably not be
perceived as aromatic.

[image: dicyclopenta[cd,gh]pentalene.png]

As a closing comment, it seems to me that if ring bonds are counted and
off-ring bonds are ignored, electron counting would correctly infer the
aromaticity or not of these compounds. MO calculations, as per Schleyer,
would not be required for this purpose – at least for these compounds!

-P.

On Wed, Jan 22, 2020 at 8:50 AM Andrew Dalke 
wrote:

> On Jan 22, 2020, at 14:12, Greg Landrum  wrote:
> > As an aside: it's not particularly relevant to this discussion, but I
> don't understand why the wikipedia page says that the compound is
> anti-aromatic. I think the standard definition of anti-aromaticity (agrees
> with the one linked to from the acepentalene page) requires the ring system
> to have 4n electrons. That definitely doesn't apply here to either the
> individual rings or the system as a whole. The system as a whole has 10
> electrons (4n+2), the individual rings each have 5 (neither aromatic nor
> anti-aromatic), and the outer envelope has 9 (again, neither aromatic nor
> anti-aromatic).
>
> Because I didn't know either, I looked into it.
>
> I think that's because (to quote "Towards experimental determination of
> conical intersection properties:a twin state based comparison with bound
> excited states", Phys. Chem. Chem. Phys., 2011,13, 11872–11877 [*] )
>
> > A Hückel MO analysis[21] leads to the conclusion that the ground state
> of the conjugated tricyclic acepentalene I is a triplet state. DFT
> calculations corrected this picture and showed a singlet global minimum
> distorted to C_s symmetry with alternated single and double bonds,[22]
> which are well described by the Lewis structures A(B,C). According to a
> B3LYP/6-31G* calculation the lowest triplet state has also a high symmetric
> C_3v configuration and lies 3.9 kcal/mol above the singlet ground state
> minimum. Acepentalene I was characterized as an antiaromatic system [23]
> despite being formally an aromatic 10 electron system: the resonance
> between each pair of Kekule structures in this case involves only 4
> electron pairs of the pentalene fragments and it averts the resonance with
> the additional fifth electron pair common for both the structures. Such a
> resonance is described as an anti-combination of two Kekule structures:
> (A–B), (C–B) and (C–A).
>
> Just need to add B3LYP/6-31G* calculations to RDKit's aromaticity
> perception algorithm and everything will be fine. :)
>
> The "characterized as an antiaromatic system[23]" is "T. K. Zywietz, H.
> Jiao, P. v. R. Schleyer and A. de Meijere, J. Org.Chem., 1998, 63, 3417" at
> https://pubs.acs.org/doi/abs/10.1021/jo980089f .
>
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
> [*]
> 

Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-22 Thread Peter S. Shenkin
Hi,

For aromaticity, I believe a ring has to have 4n+2 electrons along its
periphery.

I would be curious to know what other SMILES generators make of this
system.

-P.

On Wed, Jan 22, 2020 at 8:14 AM Greg Landrum  wrote:

> Hi Andrew,
>
> There's a bug here.
>
> Here's what I believe is happening:
> The system as a whole has 10 pi electrons, so the RDKit perceives it as
> aromatic. But then the logic that is used to flag the fusing bond in
> azulene as single (instead of aromatic) prevents the bonds between the
> central atom and the outer ones from being flagged as aromatic. This is
> clearly wrong. Now we just need to figure out how to fix it. :-)
>
> As an aside: it's not particularly relevant to this discussion, but I
> don't understand why the wikipedia page says that the compound is
> anti-aromatic. I think the standard definition of anti-aromaticity (agrees
> with the one linked to from the acepentalene page) requires the ring system
> to have 4n electrons. That definitely doesn't apply here to either the
> individual rings or the system as a whole. The system as a whole has 10
> electrons (4n+2), the individual rings each have 5 (neither aromatic nor
> anti-aromatic), and the outer envelope has 9 (again, neither aromatic nor
> anti-aromatic).
>
> Sorry for the super slow reply.
> -greg
>
>
> On Thu, Jan 9, 2020 at 9:56 PM Andrew Dalke 
> wrote:
>
>> Hi all,
>>
>> Could someone explain the following, which uses the SMILES from
>> https://en.wikipedia.org/wiki/Acepentalene :
>>
>> >>> from rdkit import Chem
>> >>> Chem.CanonSmiles("C1=CC2=CC=C3C2=C1C=C3")
>> 'c1cc2ccc3ccc1-c=3-2'
>> >>> import rdkit
>> >>> rdkit.__version__
>> '2019.09.1'
>>
>> I don't understand the aromatic "c" in the fused center of the 3
>> 5-membered rings. It's connected by non-aromatic bonds to the rest of the
>> system.
>>
>> This broke some code of mine which expects that every aromatic atom must
>> have at least two aromatic bonds. I thought that all aromatic atoms had to
>> be in aromatic rings, and that all aromatic rings had to have aromatic bond.
>>
>> (I'm ignoring RDKit's support for aromatic triple bonds in this
>> description.)
>>
>> I searched for "acepentalene" and "antiaromatic" in the issue tracker and
>> the mailing list but found nothing relevant.
>>
>> Cheers,
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-09 Thread Peter S. Shenkin
Since the entire system is antiaromatic, why are any carbons at all shown
as aromatic in the SMILES?

-P.

On Thu, Jan 9, 2020 at 3:56 PM Andrew Dalke 
wrote:

> Hi all,
>
> Could someone explain the following, which uses the SMILES from
> https://en.wikipedia.org/wiki/Acepentalene :
>
> >>> from rdkit import Chem
> >>> Chem.CanonSmiles("C1=CC2=CC=C3C2=C1C=C3")
> 'c1cc2ccc3ccc1-c=3-2'
> >>> import rdkit
> >>> rdkit.__version__
> '2019.09.1'
>
> I don't understand the aromatic "c" in the fused center of the 3
> 5-membered rings. It's connected by non-aromatic bonds to the rest of the
> system.
>
> This broke some code of mine which expects that every aromatic atom must
> have at least two aromatic bonds. I thought that all aromatic atoms had to
> be in aromatic rings, and that all aromatic rings had to have aromatic bond.
>
> (I'm ignoring RDKit's support for aromatic triple bonds in this
> description.)
>
> I searched for "acepentalene" and "antiaromatic" in the issue tracker and
> the mailing list but found nothing relevant.
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AlignMol and GetBestRMS

2019-10-17 Thread Peter S. Shenkin
(I meant an RMSD of about 1 Angstrom. )

On Thu, Oct 17, 2019 at 5:00 PM Peter S. Shenkin  wrote:

> A large RMSD could come about from a large number of small interatomic
> deviations or a small number of large ones. In the latter case, the
> difference in conformation could be large. It is useful to also obtain the
> largest interatomic deviation following superimposition in order to
> determine which situation you are in.
>
> -P.
>
> On Thu, Oct 17, 2019 at 3:01 PM Stamatia Zavitsanou <
> stamatia.zavitsa...@oriel.ox.ac.uk> wrote:
>
>> Hello all,
>>
>> I have been using the AlignMol function to get the difference in RMSD
>> between the different conformations of a molecule. (the conformations are
>> generated by another software). As I understand the the code will return
>> the  minimum RMSD and therefor the best way to align the two molecules. Is
>> that correct? If the two conformations do not differ a lot the RMSD should
>> be low ( if lower than 1A, then the two conformations don't really differ,
>> is that also correct?).
>>
>> Will the code translate and rotate the molecules in order to get the
>> perfect alignment or does it just place the one on top of the other?
>>
>> What is the difference with GetBestRMS function since the AlignMol is
>> supposed to give me the minimum RMSD?
>>
>> Many thanks,
>>
>> Stamatia Zavitsanou
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> -P.
> Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AlignMol and GetBestRMS

2019-10-17 Thread Peter S. Shenkin
A large RMSD could come about from a large number of small interatomic
deviations or a small number of large ones. In the latter case, the
difference in conformation could be large. It is useful to also obtain the
largest interatomic deviation following superimposition in order to
determine which situation you are in.

-P.

On Thu, Oct 17, 2019 at 3:01 PM Stamatia Zavitsanou <
stamatia.zavitsa...@oriel.ox.ac.uk> wrote:

> Hello all,
>
> I have been using the AlignMol function to get the difference in RMSD
> between the different conformations of a molecule. (the conformations are
> generated by another software). As I understand the the code will return
> the  minimum RMSD and therefor the best way to align the two molecules. Is
> that correct? If the two conformations do not differ a lot the RMSD should
> be low ( if lower than 1A, then the two conformations don't really differ,
> is that also correct?).
>
> Will the code translate and rotate the molecules in order to get the
> perfect alignment or does it just place the one on top of the other?
>
> What is the difference with GetBestRMS function since the AlignMol is
> supposed to give me the minimum RMSD?
>
> Many thanks,
>
> Stamatia Zavitsanou
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problems with SMILES using MolFromSmiles

2019-09-24 Thread Peter S. Shenkin
A carboxylate has to be represented as C(=O)[O-]. Use ...[OH] for an
uncharged carboxyl. Similarly, a tetravalent aliphatic N has to be given a
+ charge.

-P.

On Tue, Sep 24, 2019 at 9:15 PM Scalfani, Vincent  wrote:

> Dear Navid,
>
>
>
> RDKit rejects tetravalent Nitrogen by default. This thread below may help.
> It shows how to load the SMILES with sanitization off, then perform a
> partial sanitization.
>
>
>
> https://sourceforge.net/p/rdkit/mailman/message/32589379/
>
>
>
> Vin
>
>
>
>
>
>
>
> *From:* Navid Shervani-Tabar 
> *Sent:* Tuesday, September 24, 2019 6:55 PM
> *To:* RDKit Discuss 
> *Subject:* [Rdkit-discuss] Problems with SMILES using MolFromSmiles
>
>
>
> Hello,
>
>
>
> I have noticed that RDKit have some problems with some SMILES when trying
> to use MolFromSmiles. With further attention, I have noticed that all of
> these SMILES include nitrogen atoms. Some examples include:
>
>
>
> [NH3]CCC(=O)[O]
> NC(=[NH2])C(=O)[O]
> NC(=[NH2])[CH2].C(=O)=O
> CNC(=[NH2])C(=O)[O]
> CNC(=[NH2])C(=O)[O]
> C[C@@H](C[NH3])C(=O)[O]
>
>
>
> I was wondering if there is a way to fix this issue and study those
> molecules. Thanks!
>
>
>
> Best,
>
> Navid
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Which method to prefer for computing 2D coordinates

2019-04-09 Thread Peter S. Shenkin
When I was at Schrödinger, I wrote a simple program to find bad 2D
structures. I no longer have access to the code, but I computed two things:

1. Number of bond lengths deviating from the median bond length (MBL) by
50% or more (i.e., <0.5*MBL or >2*MBL)
2. Number of bond crossings

The overall score for a structure was the sum of these two numbers.

This won't pick the best-looking structures when the above numbers are 0,
but the the worst structures on a sorted list (one might almost say a
sordid list) of 10K drug-like structures rendered by our then
(pre-coordgen) 2D depiction sure were bad.

-P.



On Tue, Apr 9, 2019 at 10:44 AM Thomas Evangelidis 
wrote:

> Hello Lukas,
>
> I am also struggling with 2D coordinate generation quite a long time as
> well as what criteria to use for choosing the most appropriate. Therefore,
> I would be very interest to use your code for 2D coordinate selection.
>
> With best regards,
> Thomas
>
> PS: very nice notebook Jose. I also wanted to write something similar but
> never really found enough time to finish it.
>
>
>
> On Tue, 9 Apr 2019 at 16:31, Lukas Pravda  wrote:
>
>> Hi Jose,
>>
>> As you have shown there is no single method which would be perfect for
>> everything. If you don’t care that much about speed, the possible solution
>> could be to compute coordinates with all three approaches and then simply
>> select the best conformer based on some criteria.
>>
>> The solution I use is to generate 2D coordinates using multiple
>> approaches and then I have a set of methods which computes number of bond
>> collisions and atoms being close to each other using KD-tree. Altogether
>> this all is expressed as penalty score, where the lower is better.
>>
>> Should you need any code, let me know.
>>
>> Lukas
>>
>> On 09/04/2019, 14:35, "Jose Manuel Gally" 
>> wrote:
>>
>> Dear all,
>>
>> This might sound naive, but I want to compute 2D coordinates for a
>> set
>> of molecules.
>>
>> For now I am considering the 3 methods below [1].
>>
>> I was wondering if there was any recommendation to use one method
>> over
>> another in some cases?
>>
>> For instance, very large rings are not displayed round for CoordGen
>> but
>> sometimes this method performs worse than the default (AllChem).
>>
>> Computational time is not really an issue here as I generate those
>> coordinates on the fly for a very small set of compounds.
>>
>> Here is a gist with a few examples:
>> https://gist.github.com/jose-manuel/0f2a5e8eae8bf2a72c0faad7f2f2a263
>>
>> Thanks in advance, any suggestion is welcome!
>>
>> Cheers,
>> Jose Manuel
>>
>> [1] Methods:
>>
>> 1) rdkit.AllChem.Compute2dCoors (equivqlent to
>> rdkit.Chem.rdDepictor.Compute2DCoords)
>> 2) rdkit.Avalon.pyAvalonTools.Generate2DCoords
>> 3) rdkit.Chem.rdCoordGen.AddCoords + rescale
>>
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> --
>
> ==
>
> Dr Thomas Evangelidis
>
> Research Scientist
>
> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
> Academy of Sciences 
> Prague, Czech Republic
>   &
> CEITEC - Central European Institute of Technology 
> Brno, Czech Republic
>
> email: teva...@gmail.com
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Bug with Calculation of aromatic rings?

2019-03-06 Thread Peter S. Shenkin
Atom 20 appears to be an NH. Shouldn’t it be a pyridine N?

On Wed, Mar 6, 2019 at 5:04 AM Colin Bournez 
wrote:

> Hi Greg,
>
> Indeed it seems one bond is not tagged as aromatic.
>
> Here are the aromatics bond (begin atom, end atom) :
>
> 0 1
> 1 19
> 19 16
> 11 14
> 14 12
> 12 7
> 7 20
> 11 0
> 20 16
>
> We see that between the atom 11 and 16 it is not aromatic.
> It is a single type:
> 16 11 SINGLE
>
>
> The problem remains after sanitizing the molecule and both atoms are
> tagged as aromatic. A bond between two aromatic atoms can be single?
>
> On 06/03/19 10:49, Greg Landrum wrote:
>
> Hi Colin,
> The aromatic ring counting code identifies rings where every *bond* is
> aromatic, so I guess one or more bonds in the rings of the first molecule
> are not aromatic.
> Could it be that you haven't sanitized the molecule before calculating
> descriptors?
> -greg
>
> On Tue, Mar 5, 2019 at 6:00 PM Colin Bournez <
> colin.bour...@univ-orleans.fr> wrote:
>
>> Dear all, I might have encountered a little problem concerning the
>> function rdMolDescriptors.CalcNumAromaticRings(). For this molecule shown
>> with index:
>>
>> Here is what I do :
>>
>> So I have as expected my aromatic atoms but when I ask for aromatic Rings it 
>> returns 0 instead of two.
>> Anyone has an idea?
>>
>> For information if the molecule is in that form
>> It returns 2 NAR as expected.
>>
>> Colin
>>
>>
>> --
>> *Bournez Colin *  
>>  *Chemoinformatics PhD Student *
>> * Institute of Organic and Analytical Chemistry (ICOA UMR7311)*
>>  Université d'Orléans - Pôle de Chimie  Rue de Chartres - BP 6759  45067
>> Orléans Cedex 2 - France  +33 (0)2 38 49 45 77
>> <+33%202%2038%2049%2045%2077>  SBC Tool Platform  - SBC
>> Team   
>>
>> 
>>  
>>
>> 
>> ___ Rdkit-discuss mailing
>> list Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
> *Bournez Colin *  
>  *Chemoinformatics PhD Student *
> * Institute of Organic and Analytical Chemistry (ICOA UMR7311)*
>  Université d'Orléans - Pôle de Chimie  Rue de Chartres - BP 6759  45067
> Orléans Cedex 2 - France  +33 (0)2 38 49 45 77
> <+33%202%2038%2049%2045%2077>  SBC Tool Platform  - SBC
> Team   
>
> 
>  
>
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
This is a cute example. The left ring is one in which every atom and every
bond is aromatic, and yet the ring is not aromatic. Unlike azulene, in
which neither ring, alone, is aromatic

On Tue, Oct 23, 2018 at 12:36 PM Greg Landrum 
wrote:

>
> I'll try later (likely tomorrow) to explain what I meant a bit better. Or
> maybe I'll just implement it (since it seems like it could be fairly easy).
>
> On Tue, Oct 23, 2018 at 6:13 PM Chris Earnshaw 
> wrote:
>
>>
>> Following this analysis means you don't need to consider the resonance
>> form:
>> A carbonyl or imine (open chain or in a partially saturated ring) group's
>> carbon atom provides (effectively) 0 pi electrons
>> 'Ordinary' aromatic carbons provide 1 pi-electron
>> Pyridine-type nitrogens (2-connections) or pyridinium (3-connections)
>> provide 1 pi-electron
>> Pyrrole-type nitrogens (3-connections) provide 2 pi-electrons.
>>
>> So for example 3, that's
>> [image: image.png]
>> 10 pi-electrons in total, so the *system* is aromatic even though the
>> electron counts look odd on a per-ring basis (not unlike azulene).
>>
>
> On this specific instance: This is exactly what the RDKit currently does.
> However, in this scheme the left ring, which has 7 pi electrons, is not
> aromatic even though both the right ring and the envelope are.
>
> -greg
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
On Tue, Oct 23, 2018 at 1:08 PM Chris Earnshaw  wrote:

> Interesting - I do hope your idea works out!
>
> This prompted me to see what happens with azulene, which is another case
> where the envelope is aromatic but neither of the individual rings are
> based on a simple neutral representation. This ends up being related to
> Peter's example; the input SMILES c1c2c1ccc2 gets converted to
> c1cc2cc-2c1, where the fusion bond is perceived as 'pure single' rather
> than aromatic as it should be, so we do indeed end up with two non-aromatic
> rings embedded in an aromatic envelope.
>

Yes; and it's cool that crystallography shows that the fusion bond  is
considerably longer than the ring bonds. However, a common representation
of azulene is as a cyclopentadienyl anion fused to a tropylium cation. But
this picture consider the fusion bond to be aromatic.

> Is there any way to define the aromaticity of an individual ring based on
> its mapping to an aromatic envelope? Are there any cases where that
> wouldn't be true?
>

In naphthalene, the fusion bond and both rings would be considered
aromatic, if that's what you're getting at.

[image: image.png]
>>>
>>
This is a very cute example, because every bond and every atom is aromatic,
yet one ring composed of fully aromatic bonds and atoms is not (considered)
aromatic. (But if you "pyridonize" the left ring, it becomes aromatic, too.)

-P.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
I agree that are potential gotchas, and even if we can't think of them,
someone else might, which is one of the reasons that I think that, even
following any due diligence we are able to accomplish, the facility, if
implemented, should be subject to a runtime flag.

In your three graphical illustrations, I can't think of any reason that the
ring on the left should not be aromatic in all cases.

Yes, we do need to say that the exocyclic double bond, to be considered,
needs to be to an electronegative element, certainly N or O, though this
reference seems to indicate that S (a little surprisingly to me) should
also qualify. But to paraphrase Freud, "Sometimes an edge case is just an
edge case."

You wrote:

Considering the Kekule form of a structure:
- If a C atom is valence saturated and has a double bond to a "more
electronegative atom" (let's agree that N and O meet this definition and
then argue about other things later), it contributes zero pi electrons to
whatever ring system it's in.
- If a "more electronegative atom" is valence saturated and has a double
bond to a C, then it contributes two electrons to whatever ring system it's
in.


So the first atom in each of your two cases is the ring atom? So the first
point refers to the C=O carbon in pyridone and the second refers to the N
in pyridine? (Just asking for clarification here.)

Personally, the way I think of 2-pyridone aromaticity is more from the
point of view of the ring N. (The operative word may be "personally.)  The
C=O is willy-nilly stuck in sp2 hybrdization. In the VB structure, then, N,
with 3 single bonds to it, appears to be sp3-hybridized at a first glance.
But because of O's electronegativity, that C=O has significant C-[O-]
character, leaving a vacant p orbital.

Then, to achieve the thermodynamic stability that aromaticity provides, the
ring C-N(-H)-C moiety can hybridize to sp2, as C=[N+](-H)-C, where the N
has donated two electrons into the bond that has just become double. This
picture helps me understand, chemically, why the exocyclic double bond has
to be to an electronegative atom in order to confer aromaticity; namely, it
leaves the p-orbital on the C from which the double bond emanates vacant
(in that resonance structure), which allows the ring N to rehybridize. An
off-ring =CH2 would not do that; the corresponding C-[CH1-] resonance
structure would not be sufficiently stable to contribute materially.

Given this, all three structures in your illustration exhibit an off-ring
=N, so all should make the left ring aromatic. I'm not sure whether this
addresses the concerns you expressed about the 2nd and 3rd structures in
the illustration.

The kind of gotcha I am afraid of is a situation where, in two condensed
rings, a double bond to the right ring might make the left ring aromatic
but a double bond to the left ring might make the right ring aromatic, but
somehow they could not be simultaneously aromatic. Thus, then, atom
ordering might dictate the result, and if that's the case, what would the
right answer be, chemically? Might neither of them actually be aromatic?

-P.

On Tue, Oct 23, 2018 at 11:15 AM Greg Landrum 
wrote:

> hmmm, thinking about this I believe I'm coming to a simpler (and
> efficient) scheme for this after all...
>
> It's going to take me a bit to formalize, and I would want to test it on a
> bunch of molecules, but I *think* this works.
>
> Considering the Kekule form of a structure:
> - If a C atom is valence saturated and has a double bond to a "more
> electronegative atom" (let's agree that N and O meet this definition and
> then argue about other things later), it contributes zero pi electrons to
> whatever ring system it's in.
> - If a "more electronegative atom" is valence saturated and has a double
> bond to a C, then it contributes two electrons to whatever ring system it's
> in.
>
> That certainly handles the things we've discussed so far, as well as easy
> cases like pyridine and quinone. Now I need to try and find some stuff that
> breaks it.
>
> -greg
>
>
> On Tue, Oct 23, 2018 at 5:08 PM Greg Landrum 
> wrote:
>
>>
>>
>> On Tue, Oct 23, 2018 at 4:08 PM Peter S. Shenkin 
>> wrote:
>>
>>>
>>>- Easily understandable explanation:
>>>   - From the Daylight theory manual (and you've used similar
>>>   language): *exocyclic double bonds do not break aromaticity.*
>>>   - I'd alter this to *double bonds exocyclic to the ring in
>>>   question do not break aromaticity*. (I.e., even if they are in
>>>   other rings)
>>>   - Beyond this, conventional electron counting explains everything
>>>   in Francis's example and mine.
>>>-
>>>
>>> You're close, but I think there's something missing.
>> E

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
This is just to note that pyridones are considered aromatic by all SMILES
kits I've seen (thought I've certainly not seen them all!), and pyridone
itself is cited in the Daylight Theory Manual as an example of an exocyclic
double bond which does not break aromaticity.

-P.

On Tue, Oct 23, 2018 at 10:12 AM Chris Earnshaw 
wrote:

> Mea culpa - I hit Reply rather than Reply All and so only sent this to
> Greg...
>
> On Tue, 23 Oct 2018 at 13:53, Chris Earnshaw  wrote:
>
>> Hi Greg
>>
>> Apologies again, I'm not trying to stir things up here. As we can see
>> from some of the the other discussion there's no clear view of what
>> constitutes aromaticity in these cases. I'm of the school which says that
>> pyridone is at least somewhat aromatic because, in crude terms, the
>> electronegative carbonyl oxygen 'steals' the electron from the carbon, the
>> carbon provides an empty p-orbital to the conjugated ring, and the ring
>> nitrogen provides a pair of electrons - hence 4n + 2 and aromatic.
>>
>> However, the thing that really worries me is that the 'iminopyridine'
>> ring in n12c1=NCCC2 *should* be perceived in the same way as in
>> n12c1=NC.CC2 but in practice that doesn't happen - one matches the
>> pyridine SMARTS c1n1 and the other doesn't. This seems to be
>> potentially dangerous. The question of 'aromatic or not' is interesting,
>> but I'm actually more concerned with the consequences for compound
>> searching and filtering.
>>
>> As an approach, rather than simply checking if the exocyclic bond is in
>> another ring (of any type), would it be possible to check if that other
>> ring is fully conjugated? If it is, then the Huckel rule could/should be
>> applied to the fused system to determine aromaticity. If not, then the fact
>> the substituents form a ring is irrelevant and the potential aromatic
>> should be treated in the same way as the non-fused analogue. This would
>> avoid the current inconsistency, but there would no doubt still be some
>> challenging edge cases...
>>
>> Best regards,
>> Chris
>>
>> On Tue, 23 Oct 2018 at 12:43, Greg Landrum 
>> wrote:
>>
>>> Dissent is fine, but it's important to remember that there are *always*
>>> going to be edge cases and that we're not trying to model something
>>> physically observable here. The concept of aromaticity is primarily there
>>> to make canonicalization easier. Section 3.4.2 here:
>>> http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html has more
>>> info about this, as does the RDKit documentation:
>>> http://rdkit.org/docs/RDKit_Book.html#aromaticity
>>>
>>> I'm willing to change the current behavior, but someone would need to
>>> explain what it should be changed to in a way that is clear, unambiguous,
>>> and that would allow a human being looking at the structure to relatively
>>> easily figure out whether or not a given ring is aromatic.
>>>
>>>
>>> On Tue, Oct 23, 2018 at 1:17 PM Chris Earnshaw 
>>> wrote:
>>>
 Sorry about this, but I think that 'perhaps sub-optimal' should be
 replaced by 'definitely wrong'. The 'quasi-aromatic' system in these two
 structures is identical and should behave as such, but in practice one of
 them matches a pyridine SMARTS pattern and the other doesn't. That
 shouldn't be affected by whether the saturated substituents form a ring or
 not. I do appreciate that it gets messy to deal with as fused rings may be
 fully conjugated, but the current behaviour seems to be disturbingly
 inconsistent. It would be suboptimal to say that no exocyclic bond is
 allowed to steal electrons, but that may be better than what's happening
 here.

 Apologies for the dissent!

 Chris Earnshaw



 On Tue, 23 Oct 2018 at 11:57, Greg Landrum 
 wrote:

> The current implementation requires "exocyclic" bonds to actually be
> *non-ring* bonds in order to be recognized as such.
> This is perhaps sub-optimal, but it's clearly defined and avoids
> arguments about when exactly an "exocyclic" bond starts stealing 
> electrons.
>
> -greg
>
> On Tue, Oct 23, 2018 at 12:46 PM Francis Atkinson 
> wrote:
>
>> Ian,
>>
>> I make it 6 electrons: two from the N, none from the C double
>> bonded to the exocyclic N, and one each from four other carbons in the
>> ring. It's isoelectronic with *e.g.* pyridone, which is aromatic in
>> RDKit...
>>
>> In [1]: from rdkit import Chem
>>
>> In [2]: Chem.MolToSmiles(Chem.MolFromSmiles('O=c1[nH]1'))
>> Out[2]: 'O=c1[nH]1'
>>
>> The protonated/tautomerised version are indeed aromatic
>> (interconverting bewteen these species was actually how I came across 
>> this
>> issue), but I still reckon the unprotonated bicyclic should be aromatic
>> too...
>>
>> Francis
>> On 23/10/2018 11:18, Ian Tickle wrote:
>>
>>
>> Hi, it seems to me that neither is 

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
Hi, Greg,

Thank you for being so open in your response, and I certainly agree with
everything you just said. Here are my thoughts.

   - Easily understandable explanation:
  - From the Daylight theory manual (and you've used similar
language): *exocyclic
  double bonds do not break aromaticity.*
  - I'd alter this to *double bonds exocyclic to the ring in question
  do not break aromaticity*. (I.e., even if they are in other rings)
  - Beyond this, conventional electron counting explains everything in
  Francis's example and mine.
   - I've heard it asserted that introducing this modification would
   significantly slow down aromaticity perception. That is a real
   consideration which of course you can evaluate better than I.
   - This would be a major change and I would recommend being able to turn
   it on via a run-time flag before making it the default behavior.
  - This would allow user testing for performance and for the emergence
  of hidden gotchas at either the chemical or the computational level.

I hope this is helpful, at least as a starting point for discussion.

-P.



On Tue, Oct 23, 2018 at 9:08 AM Greg Landrum  wrote:

>
>
> On Tue, Oct 23, 2018 at 3:00 PM Peter S. Shenkin 
> wrote:
>
>>
>> It's difficult to fault RDKit for making the same mistake that everybody
>> else blithely accepts; but it would be great, IMO, if it could do better
>> than everyone else in this regard.
>>
>
> Again, I have no argument whatsoever with this. But a half-assed fix is
> worse than doing nothing. So in order for me to do something about it I
> need:
> "someone to explain how things should be changed to in a way that is
> clear, unambiguous, and that would allow a human being looking at the
> structure to relatively easily figure out whether or not a given ring is
> aromatic."
>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
Hi,

I raised the same issue that Francis raised on the RDKit Slack channel on
Jan 14, 1917, with a different example (c1c[nH]c2nccc-2c1). With the same
response. Of course, breaking the non-aromatic ring causes the remaining
aromatic ring to be perceived as aromatic, as Greg's response would imply.
Chemically, this is not an edge case.

I agree with Chris that this is just wrong. However, it might be some
consolation (if that's the word I want) that back when I had access to
other packages, I tried the same example in all of them and got the same
result in every case. Even Daylight's web-site depiction scheme gave the
same result. I would love to know what OpenEye does, and in any case, there
are a lot more SMILES implementations out there now than there were then,
and I'ld also love to know whether any of the others get it (to my mind,
and Chris's, and Francis's) right.

The usual response quotes Dave Weininger's comment in the Daylight Theory
manual:

It is important to remember that the purpose of the SMILES aromaticity
detection algorithm is for the purposes of chemical information
representation only! To this end, rigorous rules are provided for
determining the "aromaticity" of charged, heterocyclic, and electron
deficient ring systems. The "aromaticity" designation as used here is not
intended to imply anything about the reactivity, magnetic resonance
spectra, heat of formation, or odor of substances.

There are at least two reasonable responses:

1. What do we use SMARTS for? Is it not to locate compounds with common
substructures? Is that not predicated on the notion that at least to some
extent, common substructures will be correlated with common activity? An
implementation that fails to see these examples as aromatic will fail to do
find the aromatic substructure these compounds when an aromatic SMARTS is
used for the search.

2. If, per the quotation above, the only purpose is really
canonicalization, why do we to bother to aromatize pyrrole in SMILES? After
all, it's already unambiguous without doing so. Would an implementation
that did not perceive pyrrole as aromatic be considered acceptable? If not,
then why is this example acceptable? Perhaps because such examples are
rare. But are we not, in fact, usually looking for unusual examples of
compounds containing a desired substructure?


It's difficult to fault RDKit for making the same mistake that everybody
else blithely accepts; but it would be great, IMO, if it could do better
than everyone else in this regard.

-P.


On Tue, Oct 23, 2018 at 8:15 AM Francis Atkinson  wrote:

> Ian,
>
> I think the idea is that the (out-of-plane) p orbital on the carbonyl
> C is both part of the ring pi-system and the carbonyl pi-system. However,
> both pi-electrons in the carbonyl 'belong to' the oxygen because it's more
> electronegative, and they thus aren't counted in the 4N+2.
>
> I am sure that explanation would pain a theoretical chemist, but, as
> Greg has pointed out, this is as much an informatics issue as a chemistry
> issue.
>
> The RDKit aromaticity perception is quite an inclusive one: others (
> *e.g.* Biovia's) are less so and wouldn't count pyridone as aromatic.
>
> Francis
> On 23/10/2018 12:48, Ian Tickle wrote:
>
>
> Francis
>
> Sorry yes you're right, the C with the exocyclic d.b. doesn't contribute
> its p electron to the pi system, but then doesn't that break the
> aromaticity since a continuous ring of contributing p orbitals is surely a
> requirementI would say that 2-pyridone should not be classed as aromatic
> for the same reason but its tautomer 2-hydroxypyridine Oc1n1 clearly
> is.  In 2-pyridone the ring C-C bond lengths alternate between conjugated
> single (1.45) and double (1.34) whereas in 2-hydoxypyridine they are all
> around the aromatic C-C length (1.39).
>
>
> I guess it all depends on how you define 'aromatic' but as I understand it
> there are 4 necessary conditions:
>
> 1. Must be cyclic.
> 2. Every atom in the ring must be conjugated (i.e. contributes a p orbital
> to the pi system).
> 3. Must have 4n+2 pi electrons.
> 4. Ring must be planar (i.e. any stereochemical distortion breaks the
> aromaticity even if the other 3 conditions are fulfilled).
>
> You could add that bond lengths between like atom types should be about
> equal, but that follows from the other conditions.
>
> Cheers
>
> -- Ian
>
>
> On Tue, 23 Oct 2018 at 11:45, Francis Atkinson  wrote:
>
>> Ian,
>>
>> I make it 6 electrons: two from the N, none from the C double bonded
>> to the exocyclic N, and one each from four other carbons in the ring. It's
>> isoelectronic with *e.g.* pyridone, which is aromatic in RDKit...
>>
>> In [1]: from rdkit import Chem
>>
>> In [2]: Chem.MolToSmiles(Chem.MolFromSmiles('O=c1[nH]1'))
>> Out[2]: 'O=c1[nH]1'
>>
>> The protonated/tautomerised version are indeed aromatic
>> (interconverting bewteen these species was actually how I came across this
>> issue), but I still reckon the 

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Peter S. Shenkin
It is very far from a solved problem, since it depends strongly on the
interactions within the crystal. And it’s not terribly uncommon for a
drug-like compound to exhibit different crystal forms, each with its own
melting point and solubility. This has been an issue for drug formulation,
where you usually want to want to stabilize and distribute the least stable
(most soluble) crystal form.

I think there was recently a blog posting on this from Nextstep.

-P.

On Wed, Oct 10, 2018 at 7:51 AM Michal Krompiec 
wrote:

> Hi all,
> I have a slightly off-topic question. I'm trying to train a neural network
> on a dataset of small molecules and their melting points. I did get a
> not-so-bad accuracy with Morgan fingerprints, but I've realised that
> regardless of FP radius and bitvector length, several dozen molecules have
> the same fingerprints but wildly different melting points. I am pretty sure
> this is a "solved problem" so I don't want to reinvent the wheel. What is
> the recommended/usual way of dealing with this?
> Thanks,
> Michal
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
-P.
Sent from a cell phone. Pls forgive brvty and m1$tea@ks.
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Peter S. Shenkin
Ah, David, but how do you define a "real" singleton?

-P.

On Wed, Sep 26, 2018 at 1:30 PM David Cosgrove 
wrote:

> Slightly off topic, but a minor issue with the Taylor-Butina algorithm is
> that it generates “false singletons”. These are molecules just outside the
> clustering cutoff that are stranded when their neighbours are put in a
> different, larger cluster. We used to find it convenient to have a sweep of
> these, at a slightly looser cutoff, and drop them into the cluster whose
> centroid/seed they were nearest too. This could be added to Andrew’s code
> quite easily. At the very least, it’s worth keeping track of the initial
> number of neighbours within the cluster cutoff that each fingerprint had so
> as to distinguish real singletons from these artefactual ones.
> Dave
>
>
> On Tue, 25 Sep 2018 at 19:56, Peter S. Shenkin  wrote:
>
>> Well, I'm not really familiar with the Taylor-Butina clustering method,
>> so I'm proposing a methodology based on generalizing something that I found
>> to be useful in a somewhat different clustering context.
>>
>> Presuming that what you are clustering is the fingerprints of structures,
>> and that you know which structures are in each cluster, you'd compute the
>> average of all the fingerprints. That is, each bit position would be given
>> a floating point number that is the average of the 0s and 1s at that
>> position computed over the structures in the cluster.  Then you'd compute
>> the distance (say, Manhattan or Euclidian) between the fingerprint of each
>> structure in the cluster and the average so computed. The "most
>> representative structure" would be the cluster member whose distance is
>> closest to the cluster's average fingerprint. (Some additional mileage
>> could be gained by seeing just how far away from the averag the "most
>> representative structures" are. It might be more representative (i.e.,
>> closer) for some clusters than for others.
>>
>> It would make sense to try this (since it's easy enough) and see whether
>> the resulting "most representative structures" from the clusters really are
>> at least roughly representative, by comparing them with viewable random
>> subsets of structures from the clusters.
>>
>> -P.
>>
>> On Tue, Sep 25, 2018 at 2:36 PM, Andrew Dalke 
>> wrote:
>>
>>> On Sep 25, 2018, at 17:13, Peter S. Shenkin  wrote:
>>> > FWIW, in work on conformational clustering, I used the “most
>>> representative” molecule; that is, the real molecule closest to the
>>> mathematical centroid. This would probably be the best way of displaying a
>>> single molecule that typifies what is in the cluster.
>>>
>>> In some sense I'm rephrasing Chris Earnshaw's earlier question - how
>>> does one do that with Taylor-Butina clustering? And does it make sense?
>>>
>>> The algorithm starts by picking a centroid based on the fingerprints
>>> with the highest number of neighbors, so none of the other cluster members
>>> should have more neighbors within that cutoff.
>>>
>>> I am far from an expert on this topic, but with any alternative I can
>>> think of makes me think I should have started with something other than
>>> Taylor-Butina.
>>>
>>>
>>>
>>> Andrew
>>> da...@dalkescientific.com
>>>
>>>
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Peter S. Shenkin
Well, I'm not really familiar with the Taylor-Butina clustering method, so
I'm proposing a methodology based on generalizing something that I found to
be useful in a somewhat different clustering context.

Presuming that what you are clustering is the fingerprints of structures,
and that you know which structures are in each cluster, you'd compute the
average of all the fingerprints. That is, each bit position would be given
a floating point number that is the average of the 0s and 1s at that
position computed over the structures in the cluster.  Then you'd compute
the distance (say, Manhattan or Euclidian) between the fingerprint of each
structure in the cluster and the average so computed. The "most
representative structure" would be the cluster member whose distance is
closest to the cluster's average fingerprint. (Some additional mileage
could be gained by seeing just how far away from the averag the "most
representative structures" are. It might be more representative (i.e.,
closer) for some clusters than for others.

It would make sense to try this (since it's easy enough) and see whether
the resulting "most representative structures" from the clusters really are
at least roughly representative, by comparing them with viewable random
subsets of structures from the clusters.

-P.

On Tue, Sep 25, 2018 at 2:36 PM, Andrew Dalke 
wrote:

> On Sep 25, 2018, at 17:13, Peter S. Shenkin  wrote:
> > FWIW, in work on conformational clustering, I used the “most
> representative” molecule; that is, the real molecule closest to the
> mathematical centroid. This would probably be the best way of displaying a
> single molecule that typifies what is in the cluster.
>
> In some sense I'm rephrasing Chris Earnshaw's earlier question - how does
> one do that with Taylor-Butina clustering? And does it make sense?
>
> The algorithm starts by picking a centroid based on the fingerprints with
> the highest number of neighbors, so none of the other cluster members
> should have more neighbors within that cutoff.
>
> I am far from an expert on this topic, but with any alternative I can
> think of makes me think I should have started with something other than
> Taylor-Butina.
>
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Peter S. Shenkin
(I see that I accidentally responded to Andrew, only, earlier; I'm copying
to the group this time.)

FWIW, in work on conformational clustering, I used the “most
representative” molecule; that is, the real molecule closest to the
mathematical centroid. This would probably be the best way of displaying a
single molecule that typifies what is in the cluster.

-P.

On Tue, Sep 25, 2018 at 8:09 AM, Andrew Dalke 
wrote:

> On Sep 21, 2018, at 14:53, Philipp Thiel  tuebingen.de> wrote:
> > you probably read about the Tanimoto being a proper metric in case of
> having binary data
> > in Leach and Gillet 'Introduction to Chemoinformatics' chapter 5.3.1 in
> the revised edition.
>
> What we call Tanimoto is more broadly known as the Jaccard. Various sites
> demonstrate that the Jaccard distance = 1-Jaccard = 1-Tanimoto is a metric,
> such as https://mathoverflow.net/questions/18084/is-the-
> jaccard-distance-a-distance and https://arxiv.org/abs/1612.02696 .
>
> Going back to James T. Metz's original question, one alternative might be
> to use chemfp and the Taylor-Butina clustering implementation available at:
>
>   http://dalkescientific.com/writings/taylor_butina.py
>
> Following Dave Cosgrove's advice:
>
> > I expect James means what we used to call the cluster seed, i.e. the
> molecule the cluster was based on, rather than the mathematical centroid.
> Calculating distances from each cluster member to that would be quite
> straightforward as a post-processing step although that would roughly
> double the time taken.
>
> it's possible to change the reporting code from:
>
> for centroid_idx, members in clusters:
> print(arena.ids[centroid_idx], "has", len(members), "other
> members", file=outfile)
> print("=>", " ".join(arena.ids[idx] for idx in members),
> file=outfile)
>
> so it does the post-processing:
>
> print(len(clusters), "clusters", file=outfile)
> for centroid_idx, members in clusters:
> print(arena.ids[centroid_idx], "has", len(members), "other
> members", file=outfile)
> subarena = arena.copy(indices=members)
> centroid_fp = arena.get_fingerprint(centroid_idx)
> result = subarena.threshold_tanimoto_search_fp(centroid_fp,
> threshold=0.0)
> result.reorder()  # sort so the highest scores come first
> for id, score in result.get_ids_and_scores():
> print("=>", id, "score:", score)
>
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] enumeration of smiles question

2018-08-06 Thread Peter S. Shenkin
Just curious, Guillaume, why do you want to do this?

On Mon, Aug 6, 2018 at 5:58 AM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Dear Greg,
>
>
>
> Fantastic, thank you to give both explanation and solution to this “simple
> question”, I know this is not so simple & it’s fundamental for data
> augmentation in deep learning.
>
>
>
> If I may, I have another question related, do you know if someone has
> worked on a generator of all unique smiles independently of RDKit ?
>
>
>
> Thanks again,
>
>
>
> Guillaume
>
>
>
> *De : *Greg Landrum 
> *Date : *lundi, 6 août 2018 à 11:40
> *À : *Guillaume GODIN 
> *Cc : *RDKit Discuss 
> *Objet : *Re: [Rdkit-discuss] enumeration of smiles question
>
>
>
>
>
> On Thu, Aug 2, 2018 at 8:59 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
>
>
> I have a simple question about generating all possible smiles of a given
> molecule:
>
>
>
> It's a simple question, but the answer is somewhat complicated. :-)
>
>
>
>
>
> RDKit provides only 4 differents smiles for my molecule “CCC1CC1“:
>
> C1C(CC)C1
>
> CCC1CC1
>
> C1(CC)CC1
>
> C(C)C1CC1
>
>
>
> While by hand we can write those 7 smiles:
>
> CCC1CC1
>
> C(C)C1CC1
>
> C(C1CC1)C
>
> C1CC(CC)1
>
> C1C(CC)C1
>
> C1CC1CC
>
> C(CC)1CC1
>
>
>
> I use this function for the enumeration:
>
>
>
> def allsmiles(smil):
>
> m = Chem.MolFromSmiles(smil) # Construct a molecule from a SMILES
> string.
>
> if m is None:
>
> return smil
>
> N = m.GetNumAtoms()
>
> if N==0:
>
> return smil
>
> try:
>
> n= np.random.randint(0,high=N)
>
> t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
>
> except :
>
> return smil
>
> return t
>
>
>
> n= 50
>
> SMILES = [“CCC1CC1”]
>
> SMILES_mult = [allsmiles(S) for S in SMILES for i in range(n)]
>
>
>
> Why we cannot generate all the 7 smiles ?
>
>
>
> The RDKit has rules that it uses to decide which atom to branch to when
> generating a SMILES. These are used regardless of whether you are
> generating canonical SMILES or not.
>
> The upshot of this is that it will never generate a SMILES where there's a
> branch before a ring closure.
>
> The other important factor here is that atom rank is determined by the
> index of the atom in the molecule when you aren't using canonicalization.
> So changing the atom order on input can help:
>
> In [12]: set(allsmiles('CCC1CC1') for i in range(50))
>
> Out[12]: {'C(C)C1CC1', 'C1(CC)CC1', 'C1C(CC)C1', 'CCC1CC1'}
>
>
>
> In [13]: set(allsmiles('C1CC1CC') for i in range(50))
>
> Out[13]: {'C(C1CC1)C', 'C1(CC)CC1', 'C1CC1CC', 'CCC1CC1'}
>
> You can do this all at once as follows:
>
>
>
> ```
>
> In [20]: def allsmiles(smil):
>
> ...: m = Chem.MolFromSmiles(smil) # Construct a molecule from a
> SMILES string.
>
> ...: if m is None:
>
> ...: return smil
>
> ...: N = m.GetNumAtoms()
>
> ...: if N==0:
>
> ...: return smil
>
> ...: aids = list(range(N))
>
> ...: random.shuffle(aids)
>
> ...: m = Chem.RenumberAtoms(m,aids)
>
> ...: try:
>
> ...: n= random.randint(0,N-1)
>
> ...: t= Chem.MolToSmiles(m, rootedAtAtom=n, canonical=False)
>
> ...: except :
>
> ...: return smil
>
> ...: return t
>
> ...:
>
> ...:
>
> ...:
>
>
>
> In [21]:
>
>
>
> In [21]: set(allsmiles('C1CC1CC') for i in range(50))
>
> Out[21]: {'C(C)C1CC1', 'C(C1CC1)C', 'C1(CC)CC1', 'C1C(CC)C1', 'C1CC1CC',
> 'CCC1CC1'}
>
> ```
>
> Note that I switched to using python's built in random module instead of
> using the one in numpy.
>
>
>
> -greg
>
>
>
>
>
>
>
>
>
> Thanks guys,
>
>
>
> Best regards,
>
>
>
> Guillaume
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The 

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread Peter S. Shenkin
Well, @jeff, there's no law saying that hashes must collide, and in fact
some are designed to make collision extremely unlikely (can you say
"SHA-2"?). But the ones in question here do collide relatively frequently,
for at least some molecular fingerprint types.

An interesting question (maybe only to me :-) ) would be how similar, in
general, the structures are that exhibit identical fingerprints, for the
well-known fingerprint types, for various fingerprint lengths. A
sufficiently complicated molecule will give lots of on bits, and for (say)
a 64-fit fingerprint, there can only be 64 possible fingerprints with all
but one bit turned on.

I realize that most fingerprints in common use today are longer than this,
but still, looking back at 64- and 32-bit fingerprints with all but one
bits on might give some insight. How short does a fingerprint of some
particular type have to be for, say, 10% of CHEMBL molecules to exhibit an
all-on pattern? How short does it have to be for, say, 10% of CHEMBL
molecules to have an exact fingerprint match with some other molecule?

-P

On Fri, Apr 20, 2018 at 1:03 PM, jeff godden <jgod...@gmail.com> wrote:

> Long ago molecular fingerprints were referred to in the literature as
> molecular hash functions. (y'know, those crazy mathematical algorithms
> which permitted rapid lookup of some string in a lookup table)  As such, we
> expected for their to be the associated hash collisions  (
> https://en.wikipedia.org/wiki/Hash_table#Collision_resolution ).  All
> this by way of saying that to go from fingerprint to the molecular
> structure which produced it is traditionally impossible unless the
> fingerprint no longer amounts to a hash(ing) function.
> --
> j
>
>
> On Fri, Apr 20, 2018 at 9:56 AM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Isn't it the case that more than one molecule can share an identical
>> fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl,
>> extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could
>> keep going and come up with multiple matches, plus multiple near-misses.
>>
>> -P.
>>
>> On Fri, Apr 20, 2018 at 10:58 AM, David Cosgrove <
>> davidacosgrov...@gmail.com> wrote:
>>
>>> Hi Brian,
>>> Dave Weininger once showed a fairly simple GA that could generally
>>> deduce a structure from a daylight fingerprint by using SMILES strings as
>>> the chromosomes and tanimoto distance to the target fingerprint as the
>>> fitness function.  He may have done a talk about it for MUG or conceivably
>>> written it up. It’d be in JCICS if so, I expect.
>>>
>>> You could probably knock up a script to do that in a couple of hours I
>>> would think using a GA library to do the mechanics. If you’re not worried
>>> about high efficiency, you don’t need to do anything fancy with mutation
>>> and crossover of the SMILES strings to ensure you always get a valid
>>> molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
>>> like what you give it.
>>> HTH,
>>> Dave
>>>
>>>
>>> On Fri, 20 Apr 2018 at 14:45, Nils Weskamp <nils.wesk...@gmail.com>
>>> wrote:
>>>
>>>> Hi Brian,
>>>>
>>>> in general, it might be difficult to come up with a deterministic
>>>> algorithm that generates exactly one structure for a given fingerprint due
>>>> to many ambiguities in the process. If you are happy with a more "fuzzy"
>>>> (approximate / probabilistic) approach, you might want to take a look at
>>>>
>>>> https://pubs.acs.org/doi/abs/10.1021/ci600383v
>>>> https://link.springer.com/article/10.1007/s10822-005-9020-4
>>>>
>>>> Given this task, I would probably start with a large database of known
>>>> compounds (PubChem, UniChem, GDB17), calculate fingerprints and then do a
>>>> similarity search with my query fingerprint.
>>>>
>>>> Hope this helps,
>>>> Nils
>>>>
>>>>
>>>> On Fri, Apr 20, 2018 at 3:13 PM, Brian Cole <col...@gmail.com> wrote:
>>>>
>>>>> Hi Chem-informaticians:
>>>>>
>>>>> I know it has been talked about in the community that fingerprints are
>>>>> not a way to obfuscate molecules for security, but I don't recall a paper
>>>>> actually demonstrating actual reverse engineering a fingerprint into a
>>>>> chemical structure. Does anyone know if such a paper exists?
>>>>>
>>>>> Code using RDKit to demonstrate the functionality would

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread Peter S. Shenkin
Isn't it the case that more than one molecule can share an identical
fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl,
extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could
keep going and come up with multiple matches, plus multiple near-misses.

-P.

On Fri, Apr 20, 2018 at 10:58 AM, David Cosgrove  wrote:

> Hi Brian,
> Dave Weininger once showed a fairly simple GA that could generally deduce
> a structure from a daylight fingerprint by using SMILES strings as the
> chromosomes and tanimoto distance to the target fingerprint as the fitness
> function.  He may have done a talk about it for MUG or conceivably written
> it up. It’d be in JCICS if so, I expect.
>
> You could probably knock up a script to do that in a couple of hours I
> would think using a GA library to do the mechanics. If you’re not worried
> about high efficiency, you don’t need to do anything fancy with mutation
> and crossover of the SMILES strings to ensure you always get a valid
> molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
> like what you give it.
> HTH,
> Dave
>
>
> On Fri, 20 Apr 2018 at 14:45, Nils Weskamp  wrote:
>
>> Hi Brian,
>>
>> in general, it might be difficult to come up with a deterministic
>> algorithm that generates exactly one structure for a given fingerprint due
>> to many ambiguities in the process. If you are happy with a more "fuzzy"
>> (approximate / probabilistic) approach, you might want to take a look at
>>
>> https://pubs.acs.org/doi/abs/10.1021/ci600383v
>> https://link.springer.com/article/10.1007/s10822-005-9020-4
>>
>> Given this task, I would probably start with a large database of known
>> compounds (PubChem, UniChem, GDB17), calculate fingerprints and then do a
>> similarity search with my query fingerprint.
>>
>> Hope this helps,
>> Nils
>>
>>
>> On Fri, Apr 20, 2018 at 3:13 PM, Brian Cole  wrote:
>>
>>> Hi Chem-informaticians:
>>>
>>> I know it has been talked about in the community that fingerprints are
>>> not a way to obfuscate molecules for security, but I don't recall a paper
>>> actually demonstrating actual reverse engineering a fingerprint into a
>>> chemical structure. Does anyone know if such a paper exists?
>>>
>>> Code using RDKit to demonstrate the functionality would be an obvious
>>> bonus as well. :-)
>>>
>>> Thanks,
>>> Brian
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
>> _
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] OFFLINE... RDKit and Mathematica

2018-01-12 Thread Peter S. Shenkin
So, do you work with Bob Nachbar? If so, please tell him I said hello.

-P. (ex-Schrödinger)

On Fri, Jan 12, 2018 at 10:06 PM, Jason Biggs  wrote:

> To the developers of RDKit - this is a great package you've made and the
> level of support and responsiveness to bugs is fantastic.
>
> I've been working on adding chemistry functionality to Mathematica, and
> the RDKit is fundamental to this functionality.  I'm writing here to see if
> there are any RDKit users who also use Mathematica, and if so, what kind of
> functionality you think is most important to include.
>
> This won't be like the python or java wrappers, but rather we are trying
> to design a Molecule object that is fully integrated with the rest of the
> Wolfram Language but uses an RDKit::ROMol as the underlying structure.  As
> we find bugs, we will report them, and when we implement functionality that
> that isn't available in the RDKit, I'm hoping to add back to the community
> here.
>
> Best wishes,
>
> Jason Biggs
>
> PS - I find it to be surreal, but my boss has taken to live-streaming our
> design meetings regarding the chemistry functionality, so if anyone is
> interested to watch they are here:  https://www.twitch.tv/stephen_
> wolfram/videos/all
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread Peter S. Shenkin
I think you probably used a slightly different SMILES than the one you
showed. The one you showed should have given ((0,1,3,4),(2,1,3,4)).

The proper merge rule would then be to consider all matches equivalent if
the 2nd and 3rd atom in the match agree, in any order; i.e, the two
carbons, indices 1 and 3 in this case.

So to do this, for each molecule, do something like this:

d = dict{}
for match in matches:
t = (match[1], match[2])
if match[1] < match[2] ):
t = (match[1], match[2])
else:
t = (match[2], match[1])
d[t] = match

You will wind up with as many dictionary elements as there are matches.

-P.


On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> RDkit Discussion Group,
>
> I have written a SMARTS to detect vicinal chlorine groups
> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
> I am trying to count the number of ("unique") occurrences of this
> pattern.
>
> For some molecules with symmetry, this results in
> over-counting.
>
> For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
> smiles1 = 'ClC(Cl)CCl'
>
> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
> ((1,2,4,3), (0,2,4,3))
>
> One possible solution is to somehow merge the two tuples according
> to a "rule."  One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
> However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
> Consider
>
> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
> My goal is to get 2 tuples of 4 atoms for smiles2
>
> smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
> Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
> I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
> I would be most grateful if anyone has ideas how to do this.  I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: Re: HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
It can,  but you have to tell it how. It can't read your mind. Give it a
SMILES and either an atom list or a SMARTS that specifies what you want.

-P.
Sent from a cell phone. Please forgive brvty and m1St@kes.

On Sep 13, 2017 4:42 PM, "Michał Nowotka" <mmm...@gmail.com> wrote:

> True, but I'm only getting molfiles instead.
> My very naive assumption was that if I'm able to highlight the
> structure manually (prinint out resulting structures images and
> highliting the query substructure using pen) then rdkit should be able
> to do the same thing.
>
> On Wed, Sep 13, 2017 at 9:36 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
> > I neglected to cc Rdkit on this earlier. If he can get the matching atom
> > list from their other program, he won't have to mess w. SMARTS matching
> in
> > Rdkit.
> >
> > -P.
> > Sent from a cell phone. Please forgive brvty and m1St@kes.
> > -- Forwarded message --
> > From: "Peter S. Shenkin" <shen...@gmail.com>
> > Date: Sep 13, 2017 3:15 PM
> > Subject: Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected
> > To: "Michał Nowotka" <mmm...@gmail.com>
> > Cc:
> >
> > Well, depending on how the substructure results from the other program
> are
> > presented, you might not have to deal with SMARTS matching at all
> yourself.
> > For example, if you have a SMILES for the structure and a list of atom
> > indices into that SMILES that constitute the matching substructure (where
> > the first atom in the SMILES has index 0), you can do the following:
> >
> > from rdkit import Chem
> > from rdkit.Chem import Draw
> >
> > smi = 'Oc1c1' # Assume a SMILES
> > matching_atoms = [0, 1] # Assume a list of matching atoms
> > mol = Chem.MolFromSmiles(smi)
> > x = Draw.MolToImage(mol,highlightAtoms=(0,1))
> > display(x)
> >
> >
> > See attached for the image, from a Jupyter notebook.
> >
> > If, on the other hand, you have to work from SMARTS, then it seems to me
> > that you need to understand something about how SMARTS works, and you
> have
> > to understand the needed chemical concepts, or at least interact with
> > someone who does.
> >
> > Otherwise, it's a bit like trying to do complicated substring matches
> using
> > regular expressions, without knowing how regular expressions work.
> >
> > -
> > P.
> >
> >
> > On Sep 13, 2017 12:12 PM, "Michał Nowotka" <mmm...@gmail.com> wrote:
> >>
> >> OK, so what I have is some substructure results from other (non-rdkit)
> >> cartridge and I want to use rdkit to generate images of all results
> >> with the query substracture highlighed and aligned.
> >> So I have two things: a list of compounds and a query compound.
> >> Now I need to highlight the query compound for every compound from the
> >> list and I need to do it at all costs. I can't leave any compound not
> >> highlighted even if rdkit by default has a different opinion weather
> >> the query compound really is a true substructure of a given compound.
> >>
> >> So how can I instruct rdkit to ignore aromacity and other factors,
> >> preferably one by one, each time going one level deeper where the last
> >> resort would be simply matching on the level of two planar graphs. Is
> >> that possible?
> >>
> >> On Wed, Sep 13, 2017 at 4:48 PM, Peter S. Shenkin <shen...@gmail.com>
> >> wrote:
> >> > Your course of action depends upon just what you are really trying to
> >> > do. If
> >> > it's only aspirin, then why wouldn't you just do it manually? If it
> goes
> >> > beyond aspirin, you have to start by defining in general terms exactly
> >> > what
> >> > you want to match to what.
> >> >
> >> > For example, given a query molecule (aspirin in this case), if you
> want
> >> > all
> >> > its non-aromatic atoms to match aromatic as well as non-aromatic atoms
> >> > in
> >> > the database, you could write a string-alteration routine to munge the
> >> > SMILES of a query molecule into a SMARTS that would do just that, and
> >> > then
> >> > use that SMARTS to match your database molecules. Repeat for each
> query
> >> > molecule.
> >> >
> >> > But you have to start with a precise definition of just what kind of
> >> > matching you wish to do. For instance, maybe you don't really want
> >> > non-aroma

[Rdkit-discuss] Fwd: Re: HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
I neglected to cc Rdkit on this earlier. If he can get the matching atom
list from their other program, he won't have to mess w. SMARTS matching in
Rdkit.

-P.
Sent from a cell phone. Please forgive brvty and m1St@kes.
-- Forwarded message --
From: "Peter S. Shenkin" <shen...@gmail.com>
Date: Sep 13, 2017 3:15 PM
Subject: Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected
To: "Michał Nowotka" <mmm...@gmail.com>
Cc:

​Well, depending on how the substructure results from the other program are
presented, you might not have to deal with SMARTS matching at all yourself.
For example, if you have a SMILES for the structure and a list of atom
indices into that SMILES that constitute the matching substructure (where
the first atom in the SMILES has index 0), you can do the following:

from rdkit import Chem
from rdkit.Chem import Draw

smi = 'Oc1c1' # Assume a SMILES
matching_atoms = [0, 1] # Assume a list of matching atoms
mol = Chem.MolFromSmiles(smi)
x = Draw.MolToImage(mol,highlightAtoms=(0,1))
display(x)


​See attached for the image, from a Jupyter notebook.

If, on the other hand, you have to work from SMARTS, then it seems to me
that you need to understand something about how SMARTS works, and you have
to understand the needed chemical concepts, or at least interact with
someone who does.

Otherwise, it's a bit like trying to do complicated substring matches using
regular expressions, without knowing how regular expressions work.

-
​
P.​


On Sep 13, 2017 12:12 PM, "Michał Nowotka" <mmm...@gmail.com> wrote:

> OK, so what I have is some substructure results from other (non-rdkit)
> cartridge and I want to use rdkit to generate images of all results
> with the query substracture highlighed and aligned.
> So I have two things: a list of compounds and a query compound.
> Now I need to highlight the query compound for every compound from the
> list and I need to do it at all costs. I can't leave any compound not
> highlighted even if rdkit by default has a different opinion weather
> the query compound really is a true substructure of a given compound.
>
> So how can I instruct rdkit to ignore aromacity and other factors,
> preferably one by one, each time going one level deeper where the last
> resort would be simply matching on the level of two planar graphs. Is
> that possible?
>
> On Wed, Sep 13, 2017 at 4:48 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
> > Your course of action depends upon just what you are really trying to
> do. If
> > it's only aspirin, then why wouldn't you just do it manually? If it goes
> > beyond aspirin, you have to start by defining in general terms exactly
> what
> > you want to match to what.
> >
> > For example, given a query molecule (aspirin in this case), if you want
> all
> > its non-aromatic atoms to match aromatic as well as non-aromatic atoms in
> > the database, you could write a string-alteration routine to munge the
> > SMILES of a query molecule into a SMARTS that would do just that, and
> then
> > use that SMARTS to match your database molecules. Repeat for each query
> > molecule.
> >
> > But you have to start with a precise definition of just what kind of
> > matching you wish to do. For instance, maybe you don't really want
> > non-aromatic ring atoms in your query to match aromatic rings and vice
> versa
> > (i.e., a cyclohexyl to match a phenyl); maybe you only want non-ring
> atoms
> > in the query to match aliphatic as well as aromatic substructures. And so
> > on.
> >
> > -P.
> >
> >
> > On Wed, Sep 13, 2017 at 10:42 AM, Michał Nowotka <mmm...@gmail.com>
> wrote:
> >>
> >> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> >> aromatic analogues?
> >> The problem is that I can't modify user queries by hand in real time :)
> >>
> >> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw <cgearns...@gmail.com>
> >> wrote:
> >> > Hi
> >> >
> >> > The problem is due to RDkit perceiving the embedded pyranone in
> >> > CHEMBL1999443 as an aromatic system, which is probably correct.
> However,
> >> > in
> >> > the structure of aspirin the carboxyl carbon and singly bonded oxygen
> >> > are
> >> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> >> > won't
> >> > match CHEMBL1999443
> >> >
> >> > You'll need to use a slightly more generic aspirin-like query to allow
> >> > the
> >> > possibility of matching both 'normal' aspirin and embedded aromatic
> >> > analogues. CC(=O)Oc1ccc

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
Your course of action depends upon just what you are really trying to do.
If it's only aspirin, then why wouldn't you just do it manually? If it goes
beyond aspirin, you have to start by defining in general terms exactly what
you want to match to what.

For example, given a query molecule (aspirin in this case), if you want all
its non-aromatic atoms to match aromatic as well as non-aromatic atoms in
the database, you could write a string-alteration routine to munge the
SMILES of a query molecule into a SMARTS that would do just that, and then
use that SMARTS to match your database molecules. Repeat for each query
molecule.

But you have to start with a precise definition of just what kind of
matching you wish to do. For instance, maybe you don't really want
non-aromatic ring atoms in your query to match aromatic rings and vice
versa (i.e., a cyclohexyl to match a phenyl); maybe you only want non-ring
atoms in the query to match aliphatic as well as aromatic substructures.
And so on.

-P.


On Wed, Sep 13, 2017 at 10:42 AM, Michał Nowotka  wrote:

> Is there any flag in RDkit to match both 'normal' aspirin and embedded
> aromatic analogues?
> The problem is that I can't modify user queries by hand in real time :)
>
> On Wed, Sep 13, 2017 at 2:12 PM, Chris Earnshaw 
> wrote:
> > Hi
> >
> > The problem is due to RDkit perceiving the embedded pyranone in
> > CHEMBL1999443 as an aromatic system, which is probably correct. However,
> in
> > the structure of aspirin the carboxyl carbon and singly bonded oxygen are
> > non-aromatic, so if you just use the SMILES of aspirin as a query it
> won't
> > match CHEMBL1999443
> >
> > You'll need to use a slightly more generic aspirin-like query to allow
> the
> > possibility of matching both 'normal' aspirin and embedded aromatic
> > analogues. CC(=O)Oc1c1[#6](=O)[#8] should work OK.
> >
> > Regards,
> > Chris
> >
> > On 13 September 2017 at 13:40, Michał Nowotka  wrote:
> >>
> >> Hi,
> >>
> >> This problem is probably due to my lack of chemistry knowledge but
> >> plese have a look:
> >>
> >> If I do a substructure search in ChEMBL using aspirin (CHEMBL25) as a
> >> query (ChEMBL API uses the Symix catridge):
> >>
> >> from chembl_webresource_client.new_client import new_client
> >> res = new_client.substructure.filter(chembl_id='CHEMBL25')
> >>
> >> One of them will be CHEMBL1999443:
> >>
> >> 'CHEMBL1999443' in (r['molecule_chembl_id'] for r in res)
> >> >>> True
> >>
> >> Now I take the molfile:
> >>
> >> new_client.molecule.set_format('mol')
> >> mol = new_client.molecule.get('CHEMBL1999443')
> >>
> >> and load it with aspirin into rdkit:
> >>
> >> from rdkit import Chem
> >> m = Chem.MolFromMolBlock(mol)
> >> pattern = Chem.MolFromMolBlock(new_client.molecule.get('CHEMBL25'))
> >>
> >> If I check if it has an aspirin as a substructure using rdkit, I'm
> >> getting false...
> >>
> >> m.HasSubstructMatch(pattern)
> >> >>> False
> >>
> >> Looking at this blog post:
> >>
> >> https://github.com/rdkit/rdkit-tutorials/blob/master/
> notebooks/002_SMARTS_SubstructureMatching.ipynb
> >> I tried to initialize rings and retry:
> >>
> >>  Chem.GetSymmSSSR(m)
> >>  m.HasSubstructMatch(pattern)
> >>  >>>False
> >>
> >> Chem.GetSymmSSSR(pattern)
> >> m.HasSubstructMatch(pattern)
> >> >>>False
> >>
> >> But as you can see without any luck. Is there anything else I can do
> >> to get the match anyway?
> >> Without having a match I can't aligh and higlight asprin substructure
> >> in CHEMBL1999443 image using GenerateDepictionMatching2DStructure and
> >> DrawMolecule functions.
> >>
> >> Kind regards,
> >>
> >> Michał Nowotka
> >>
> >>
> >> 
> --
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Peter S. Shenkin
Hi,

In SMARTS, 'a' matches an aromatic atom. So you would match your molecule
with the pattern 'aaa', or if you wanted to restrict yourself to carbons,
'ccc'.

This would match whether you created the molecule from a Kekulized or an
aromatic SMILES. Remember that it's the molecular recognition code, not the
form of the input SMILES, that determines whether a molecule is aromatic.

-P.

On Fri, Sep 8, 2017 at 6:19 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in the SMILES of an aromatic molecule e.g., for
> benzene
>
> c1c1
>
> I then want to convert the molecule to a Kekule representation and
> then perform various SMARTS pattern recognition e.g.
>
> [C]=[C]-[C]
>
> I have tried various Kekule commands in RDkit, but I can not figure
> out how to (or if it is possible) to recognize a SMARTS pattern for
> a portion of a molecule which is aromatic, but is currently being
> stored as a Kekule structure.
>
> Also, is it possible to generate and store more than one Kekule
> form in RDkit?
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ETKDG conformation generation algorithm and fullerene-like structures.

2017-09-07 Thread Peter S. Shenkin
Too much symmetry for conformational comparison?

Many or most conformation generators will test new conformations for a
match with previously generated conformations, and will bail out if they
can't exhaust all possibilities.

(I don't know if this is the case RDKit facilities.)

-P.

On Thu, Sep 7, 2017 at 12:59 PM, Jason Biggs  wrote:

> I've never had success using the ETKDG or KDG methods for fullerenes,
> when trying on C60 it goes for a long time and returns -1.  The ETDG method
> works on C60, but fails on your C60H60.
>
> One thing you could try is to embed the hydrogen-suppressed structure,
> then add the hydrogens
>
> RDKit::DGeomHelpers::EmbedParameters params(RDKit::DGeomHelpers::ETDG);
>
> RDKit::DGeomHelpers::EmbedMolecule(*mol, params);
>
> bool explicitOnly = false;
>
> bool addCoords = true;
>
> RDKit::MolOps::addHs(*mol, explicitOnly, addCoords);
>
> seems to work.
>
>
>
> Jason Biggs
>
>
> On Thu, Sep 7, 2017 at 10:49 AM, Dmitry Redkin  wrote:
>
>> Hello all!
>> I've just started to use RDKit, and now I'm trying to generate some 3D
>> conformation for a molecule. ETKDG successfully optimized cyclohexane, so
>> I've tried some more complex example.
>> It was this fullerene-like structure (with all the single bonds and every
>> C
>> atom having H atom attached). I'm attaching it to this email.
>>
>> But whatever I've tried to do with embedding parameters, RDKit whether
>> stalls for several minutes trying to complete operation or just exits with
>> all zero coordinates.
>>
>> Is there any way to generate conformations for this structure? Maybe I did
>> something wrong or there is some flag that can be set to get some result
>> (any result, not necessarily the best one) in a reasonable time?
>>
>> My code is pretty simple, you can see it below.
>>
>>
>> RWMol *mol = MolFileToMol("d:\\temp\\exe32\\full.mol", true, false,
>> false);
>>
>> MolOps::addHs(*mol);
>> DGeomHelpers::EmbedParameters p(DGeomHelpers::ETKDG);
>> p.maxIterations = 100; // if I left it -1, I could not wait long enough
>> for
>> EmbedMolecule to exit.
>> p.useRandomCoords = true;
>> int confid = DGeomHelpers::EmbedMolecule(*((ROMol*)mol), p);
>> MolToMolFile(*((ROMol*)mol), "d:\\temp\\exe32\\full1.mol", true, confid);
>> free(mol);
>>
>>
>> 
>> Dmitry Redkin, ACD Inc.
>> red...@acdlabs.ru
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] list of failed chembl ids

2017-08-08 Thread Peter S. Shenkin
I looked up a bunch of these. The ones I saw are ChEMBL activity records,
not molecule records, so they do not contain structural data.

But I would be curious to see the 51 CHEMBL SMILES that RDKit could not
parse.

-P.


-P.

On Tue, Aug 8, 2017 at 3:00 PM, Bennion, Brian  wrote:

> Hello,
>
>
>
> If anyone is interested, the list of chembl ids for compounds that had
> such crazy 2D sd files are listed below. Several are just different
> formulations of the same parent compound.
>
>
>
> 181880
>
> 450200
>
> 1198593
>
> 1201364
>
> 1977677
>
> 1992520
>
> 2146259
>
> 2146289
>
> 2146290
>
> 2299271
>
> 3182693
>
> 3184182
>
> 3187332
>
> 3188868
>
> 3187972
>
> 3211150
>
> 3349005
>
> 3348969
>
> 3833021
>
> 3397072
>
> 3544677
>
> 3561635
>
> 3593577
>
> 3594279
>
> 3580437
>
> 3558859
>
> 3558860
>
> 3558861
>
> 3832893
>
> 3832892
>
> 3832897
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Peter S. Shenkin
That molecule's SMILES is correctly rendered by RDKit, or at least by the
version of RDKit behind Slack:

[image: Inline image 1]


-P.

On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian  wrote:

> The carbocations are in small heterocyclic molecules. see CHEMBL3815233
>
> Brian
>
>
> --
> *From:* Chris Swain 
> *Sent:* Monday, August 7, 2017 11:46:30 AM
> *To:* rdkit-discuss@lists.sourceforge.net
> *Subject:* [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7
> million compounds
>
> I've not tried to read in ChEMBL but I have tried to process other large
> datasets e.g. ZINC. My impression was that problems arose with small
> heterocyclic systems, particularly if fused or containing multiple
> different heteroatoms. I did wonder if the different aromaticity models
> might be the issue.
>
> Chris
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Peter S. Shenkin and Leila Tai Shenkin have moved!

2017-07-10 Thread Peter S . Shenkin
New contact information for Peter & Leila Shenkin and
 Leila Tai Jewelry Design
 
 Changed:
 Address:
 We have moved from Manhattan to Forest Hills.
 Home telephone (land line):
 347-454-9162 (replaces 212-757-2210)
 
 Unchanged:
 Cell phones:
 Peter: 646 528 5352
 Leila: 646 331 2210
 Email:
 Peter: shen...@gmail.com
 Leila: leila_shen...@mindspring.com
 Leila (work): le...@leilataidesign.com
Peter's Stories by Peter S. Shenkin
http://tinyletter.com/shenkin
325 W. 52nd St New York, NY 10019 USA

Sent to rdkit-discuss@lists.sourceforge.net
Unsubscribe: 
http://tinyletter.com/shenkin/unsub?c=c21cce39-2d24-4f70-931a-434b32de950f=peter-s-shenkin-and-leila-tai-shenkin-have-moved-1

Delivered by:
http://tinyletter.com--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Back When Gas Was 30¢ A Gallon

2017-07-09 Thread Peter S . Shenkin
You may have received this story previously. If so, please excuse the 
duplication.

-P


Back When Gas Was 30¢ A Gallon

Peter S. Shenkin

Back when gas was 30¢ a gallon,

And love was only 60¢ away

Thus sang Tom T. Hall. I can’t say this story is exactly about that, but it 
took place exactly about then.

Mise en scène: Grant’s Tavern, Blairsville Precinct, Williamson County, 
(Southern) Illinois.

Ole Grant kept an overcoat hanging on a hook behind the door in all seasons. If 
there was trouble, he'd put it on. Everyone knew there was a revolver in the 
pocket. Or at least, everyone believed it, which was enough.

He had live country bands Friday and Saturday nights. He had a bowling machine, 
pinball, a few other games and a juke box. He had a bar, a roomy dance floor 
stocked with country honeys and live country music on the weekends, tables on 
an elevated platform at the back maybe 1/3 the size of the dance floor. The 
platform, that is.

One Saturday evening, 1969, I took a bunch of my hippie friends there. 
Including Alberto Navarro from Bogota and Mike Bartlett, a computer nerd who 
raced small cars. Both now deceased, which I am sorrier than you can imagine to 
have to say. Ron Manning and Peter Munch (grandnephew of the artist Edvard 
Munch) as well. Probably John Harty. We had spent the day imbibing various 
licit and illicit substances in the beautiful countryside and perhaps had had 
lunch at Ma Hale's in Grand Tower, unless that was another day, but either way 
we thought Blairsville would be good for a night cap. They put a few tables 
together in the back for us and the waitress got busy with other customers.

Alberto got annoyed at the lack of attention. He jumped up on a chair and 
shouted if they didn't come and serve us pronto he was going to put LSD in the 
Blairsville water supply. The rest of us were looking around nervously and 
trying to calm him down, hoping that Ole Grant wouldn't resort to the overcoat.

Just then the waitress came over and called Alberto "Dear" and asked what she 
could get him. That calmed him down considerably, which calmed the rest of us 
down considerably. There was no way they could have understood Alberto's accent 
anyway (which he never lost till his dying day, though his command of the 
English language was better than mine).

Either way, they seemed to be used to this sort of thing, which was fine by us.

Oh -- Tom T. Hall’s song is here 
<https://www.google.com/url?q=https://www.youtube.com/watch?v%3DgbPJ3Q9Tfbssa=Dusg=AFQjCNFQzJgaaMiV7bubM7SgWvIfwh3xsQ>.
 
<https://www.google.com/url?q=https://www.youtube.com/watch?v%3DgbPJ3Q9Tfbssa=Dusg=AFQjCNFQzJgaaMiV7bubM7SgWvIfwh3xsQ>

--

Links to all my stories can be found here 
<https://www.google.com/url?q=https://docs.google.com/document/d/1whaI0Yvg66jyy6e4Hbqd6aJEhsV36Cmlru3lkRfQaro/pubsa=Dusg=AFQjCNFWcFN1ecotzZSkd43UrVRggQgosw>.
Peter's Stories by Peter S. Shenkin
http://tinyletter.com/shenkin
325 W. 52nd St New York, NY 10019 USA

Sent to rdkit-discuss@lists.sourceforge.net
Unsubscribe: 
http://tinyletter.com/shenkin/unsub?c=c21cce39-2d24-4f70-931a-434b32de950f=back-when-gas-was-30-a-gallon

Delivered by:
http://tinyletter.com
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering

2017-06-12 Thread Peter S. Shenkin
" A clustering algorithm, that does not require specifying the number
of classes upfront (so not K-means)."

A general approach to O(N) hierarchical clustering is:

1. Pick a random sqrt(N) structures.
2. Do full hierarchical O(N^2) clustering on these.
3. Select your favored clustering level to define clusters, and store the
centroid (or most representative member) of each.
4. For all N structures, associate each with the cluster whose centroid (or
most representative member) is closest to N.

I've never tried this, but I've heard it suggested at talks, which were
not, however, about molecular clustering; but the method should be general.

Step 3 gives you some control.

-P.

On Mon, Jun 12, 2017 at 10:06 AM, Michał Nowotka  wrote:

> Hi,
>
> Thanks for all the answers, especially those pointing to code
> examples, very useful.
> I should be more specific when asking about clustering >2M compounds.
>
> An example I would like to see would use:
>
> 1. A clustering algorithm, that does not require specifying the number
> of classes upfront (so not K-means).
> 2. An algorithm that is a bit more sophisticated than Taylor-Butina
> 3. Preferably one from pyclustering (NOT from scipy.cluster, sorry for
> mistake)
>
> In those, somewhat more sophisticated algorithms, running PCA will not
> help. You can try to cluster >2M points on 2D surface and you will
> find out that this is not a trivial task.
>
> That being said, I don't expect any amazing results when doing
> compound clustering using those algorithms. And I agree that
> clustering a random sample can give similar results. This question is
> more out of curiosity.
>
> Michał
>
> On Sun, Jun 11, 2017 at 7:58 PM, Samo Turk  wrote:
> > Hi All,
> >
> > I have to admit I was commenting about PCA->k-means without actually
> trying.
> > Out of curiosity I implemented it here:
> > https://github.com/samoturk/cheminf-notebooks/tree/master/
> Python#pca-k-meanspy
> >
> > It can process 4M compounds in ~60 minutes on desktop i5 and it should
> work
> > with 16GB or RAM. Clusters that come out make (some) sense but in this
> > regard Butina is better.
> >
> > PS. DataWarrior can easily load the results (it takes 30 min) but then it
> > works smoothly.
> >
> > Cheers,
> > Samo
> >
> > On Mon, Jun 5, 2017 at 7:46 PM, Abhik Seal  wrote:
> >>
> >> Hello all ,
> >>
> >> How about doing some dimension reduction using  pca or Tsne and then run
> >> clustering using some selected top components like top 20 and I think
> then
> >> the clustering would be fast .
> >>
> >> Thanks
> >> Abhik
> >>
> >> On Mon, Jun 5, 2017 at 6:11 AM David Cosgrove <
> davidacosgrov...@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> I have used this algorithm for many years clustering sets of several
> >>> millions of compounds.  Indeed, I am old enough to know it as the
> Taylor
> >>> algorithm.  It is slow but reliable.  A crucial setting is the
> similarity
> >>> threshold for the clusters, which dictates the size of the neighbour
> lists
> >>> and hence the amount of RAM required.  It also, of course, determines
> the
> >>> quality of the clusters.  My implementation is at
> >>> https://github.com/OpenEye-Contrib/Flush.git.  This repo has a number
> of
> >>> programs of relevance, the one you want is called cluster.  I have just
> >>> confirmed that it compiles on ubuntu 16.  It needs the fingerprints as
> ascii
> >>> bitstrings, I don't have code for turning RDKit fingerprints into this
> >>> format, but I would imagine it's quite straightforward.  The program
> runs in
> >>> parallel using OpenMPI.  That's valuable for two reasons.  One is
> speed, but
> >>> the more important one is memory use.  If you can spread the slave
> processes
> >>> over several machines you can cluster much larger sets of molecules as
> you
> >>> are effectively expanding the RAM of the machine.  When I wrote the
> >>> original, 64MB was a lot of RAM, it is less of an issue these days but
> still
> >>> matters if clustering millions of fingerprints.  Note that the program
> >>> cluster doesn't ever store the distance matrix, just the lists of
> neighbours
> >>> for each molecule within the threshold.  This reduces the memory
> footprint
> >>> substantially if you have a tight-enough cluster threshold.
> >>> HTH,
> >>> Dave
> >>>
> >>>
> >>>
> >>> On Mon, Jun 5, 2017 at 11:22 AM, Nils Weskamp 
> >>> wrote:
> 
>  Hi Michal,
> 
>  I have done this a couple of times for compound sets up to 10M+ using
> a
>  simplified variant of the Taylor-Butina algorithm. The overall run
> time
>  was in the range of hours to a few days (which could probably be
>  optimized, but was fast enough for me).
> 
>  As you correctly mentioned, getting the (sparse) similarity matrix is
>  fairly simple (and can be done in parallel on a cluster).
> Unfortunately,
>  this matrix gets very large (even the sparse version). 

Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Peter S. Shenkin
Hi,

If the compound is neutral overall and there is a single H where you drew
it, then a valid RDKit SMILES for the nitrogen-containing terminal group is
C[N+](C)(C)[NH-], which is one of the forms I gave earlier.

It is not a zwitterion. Rather, it represents a dative bond. (I am not sure
that all [X+][Y-] bonds are dative bonds, but my guess is that they are.)

Attached are SMILES for some well known nitrogen compounds with adjacent +
and - charges. including nitromethane (lower left). All have single-bonded
"ion pairs", but none are zwitterions. Sorry the drawing (from Slack) is so
small.

Carbon monoxide, [C-]#[0+]. The version of RDKit now hooked up to Slack
can't draw it, but I believe that's due to a known bug that also keeps it
from drawing ethane, CC.

Best,
-P.


On Thu, May 11, 2017 at 1:45 PM, Yuran Wang <wangyuran...@gmail.com> wrote:

> Hi Peter,
> Thank you for your reply. I did not quite understand what you mean by 'But
> this makes no sense'.
> Also the SMILES you tested are zwitterionic form. In this link
> http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization, the
> zwitterionic form seems suitable for N=O, N#N, not for N=N. But I may just
> have a very limited knowledge of RDkit.
>
> This is how it looks like in ChemDraw:
> [image: Inline image 1]
>
>
> Thanks,
> Yuran
>
> On Thu, May 11, 2017 at 1:33 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> The problematic part is just the beginning of your would-be SMILES:
>> N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps
>> you mean one of the substructures illustrated in the attached (which at
>> least satisfy normal valence rules). If not, perhaps you could attach a
>> structural diagram of what you do mean.
>>
>> -P.
>>
>>
>> On Thu, May 11, 2017 at 11:02 AM, Yuran Wang <wangyuran...@gmail.com>
>> wrote:
>>
>>> Dear Greg,
>>> Thank you very much for the suggestions. It works for me!
>>> Here is the SMILES of one molecule that I am looking
>>> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F
>>> Any better alternative will be appreciated.
>>>
>>> Thanks,
>>> Yuran
>>>
>>> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum <greg.land...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang <wangyuran...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a question regarding the available valence of Nitrogen. It
>>>>> seems only 3 is available in the default setting (atomic_data.cpp). Why is
>>>>> it kept to only 3, and not extended to include 4 and 5? If I change it
>>>>> locally to include 4 and 5, will it cause any problems?
>>>>>
>>>>
>>>> Aside from generating molecules that don't make any chemical sense?
>>>> Probably not, but the lack of chemical sense may cause some unexpected
>>>> behavior.
>>>>
>>>>
>>>>> I am aware that I could turn off the sanitization to get a mol object,
>>>>> however, it cannot be further processed to get fingerprints, which is what
>>>>> I need.
>>>>>
>>>>
>>>> Well, you could turn off the sanitization on molecule construction and
>>>> then manually sanitize with the valence check turned off. Here's a simple
>>>> example of that:
>>>>
>>>> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)
>>>>
>>>> In [12]: m.UpdatePropertyCache(strict=False)
>>>>
>>>> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SET
>>>> CONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
>>>> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>>>
>>>> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
>>>> Out[14]: >>> 0x10b0ab350>
>>>>
>>>>
>>>> But, again, the RDKit's valence rules tend to reflect real chemistry.
>>>> What are you trying to represent that you need 5 coordinate neutral
>>>> nitrogen atoms? There may be a better way.
>>>>
>>>> -greg
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Best,
>>> Yuran Wang
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> Best,
> Yuran Wang
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Peter S. Shenkin
The problematic part is just the beginning of your would-be SMILES:
N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps
you mean one of the substructures illustrated in the attached (which at
least satisfy normal valence rules). If not, perhaps you could attach a
structural diagram of what you do mean.

-P.


On Thu, May 11, 2017 at 11:02 AM, Yuran Wang  wrote:

> Dear Greg,
> Thank you very much for the suggestions. It works for me!
> Here is the SMILES of one molecule that I am looking
> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F
> Any better alternative will be appreciated.
>
> Thanks,
> Yuran
>
> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum 
> wrote:
>
>>
>>
>> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang 
>> wrote:
>>
>>> I have a question regarding the available valence of Nitrogen. It seems
>>> only 3 is available in the default setting (atomic_data.cpp). Why is it
>>> kept to only 3, and not extended to include 4 and 5? If I change it locally
>>> to include 4 and 5, will it cause any problems?
>>>
>>
>> Aside from generating molecules that don't make any chemical sense?
>> Probably not, but the lack of chemical sense may cause some unexpected
>> behavior.
>>
>>
>>> I am aware that I could turn off the sanitization to get a mol object,
>>> however, it cannot be further processed to get fingerprints, which is what
>>> I need.
>>>
>>
>> Well, you could turn off the sanitization on molecule construction and
>> then manually sanitize with the valence check turned off. Here's a simple
>> example of that:
>>
>> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)
>>
>> In [12]: m.UpdatePropertyCache(strict=False)
>>
>> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_
>> SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
>> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>
>> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
>> Out[14]: > 0x10b0ab350>
>>
>>
>> But, again, the RDKit's valence rules tend to reflect real chemistry.
>> What are you trying to represent that you need 5 coordinate neutral
>> nitrogen atoms? There may be a better way.
>>
>> -greg
>>
>>
>
>
>
> --
> Best,
> Yuran Wang
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit-Py3DMol integration

2017-05-09 Thread Peter S. Shenkin
Hi, Malitha,

I was trying to make a bit of joke, but of course the Welcome was sincere.
;-)

Best,
-P.

On Tue, May 9, 2017 at 6:52 PM, Malitha Kabir <malitha12...@gmail.com>
wrote:

> Notes to Peter,
> Dear sir,
> Thank you very much for your time on writing an excellent welcome note
> together with advice. I will definity work on anything that our community
> wants me to accomplish. Being very realistic, all my targets are now set by
> Paul and Gerg. My commitment here is to keep participating in RDKit and
> 3Dmol.js after reaching the initial targets. Thanks much again.
>
> Notes to Paul:
> Dear sir,
> I am more than grateful to you for introducing me in RDKit community. You
> are the motivation why I'm here right now. I will put my best effort
> definitely. Thank you very much for all your worries.
>
> Notes to Gerg:
> Dear sir,
> Thank you very much for your time in mentoring me and also for putting
> such useful tool for community use that comes totally free of cost. I will
> keep working in RDKit development if my skills remain as per requirements.
> Thank you very much again for reviewing proposal and future help as well.
>
> Notes to David Koes:
> Dear sir,
> I will come up with specific questions very soon (within one or two
> weeks). I wish you don't mind getting my touch. Thanks much in advance for
> your kind help.
>
> Sincerely,
> -Malitha
>
>
>
> On May 10, 2017 1:59 AM, "Peter S. Shenkin" <shen...@gmail.com> wrote:
>
>> Welcome Malitha! The community expects you to fix everything that's
>> broken ;-)
>>
>> (After all, you have the whole summer to do it)
>>
>> Cheers,
>> -P.
>>
>> On Tue, May 9, 2017 at 1:06 PM, Paul Czodrowski <
>> paul.czodrow...@merckgroup.com> wrote:
>>
>>> Dear RDkitters,
>>>
>>>
>>>
>>> This is to inform you exciting community Malitha Kabir who will working
>>> as a GoogeSummerOfCode (GSoC) student over the next couple couple of weeks
>>> on the RDKit-Py3DMol integration.
>>>
>>>
>>>
>>> Let’s give Malitha a warm welcome (and comprehensive replies during his
>>> GSoC project)!
>>>
>>>
>>>
>>> On behalf of the mentors: Greg & Paul
>>>
>>>
>>>
>>>
>>>
>>> This message and any attachment are confidential and may be privileged
>>> or otherwise protected from disclosure. If you are not the intended
>>> recipient, you must not copy this message or attachment or disclose the
>>> contents to any other person. If you have received this transmission in
>>> error, please notify the sender immediately and delete the message and any
>>> attachment from your system. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not accept liability for any omissions or errors in this
>>> message which may arise as a result of E-Mail-transmission or for damages
>>> resulting from any unauthorized changes of the content of this message and
>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not guarantee that this message is free of viruses and does
>>> not accept liability for any damages caused by any virus transmitted
>>> therewith.
>>>
>>>
>>>
>>> Click http://www.merckgroup.com/disclaimer to access the German,
>>> French, Spanish and Portuguese versions of this disclaimer.
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Another Can't kekulize mol observation

2017-04-27 Thread Peter S. Shenkin
I would just replace 'n' with '[nH]' in your existing SMILES, for the N you
want the H on.

-P.

On Thu, Apr 27, 2017 at 12:32 AM, Hongbin Yang  wrote:

> Hi Markus,
> “c1ccc(cc1)-c1nnc(n1)-c1c1” is different from 
> "c1ccc(cc1)-c1nncn1-c1c1",
> so you cannot remove the parentheses.
>
> The error "Can't kekulize mol." is caused by the triazole in your
> molecule.
>
> "c1nncn1" tells that the molecule is aromatic, but it do not tell where
> the H is.
>
> For example,  "C1=NN=CN1" is "4H-1,2,4-triazole" and "C1=NC=NN1" is 
> 1H-1,2,4-triazole.
> They are different in Kekulize but both of them can represented by "c1nncn1"
>
> There's two solutions I suggest:
> 1. use `Chem.MolFromSmiles('c1ccc(cc1)-c1nnc(n1)-c1c1',False)`
> (reference: http://www.rdkit.org/docs/api/rdkit.Chem.
> rdmolfiles-module.html#MolFromSmiles)
>
> 2. Manually Kekulize it: 
> `Chem.MolFromSmiles('c1ccc(cc1)-C1=NN=C(N1)-c1c1')`
> . This indicate the H is on the 4'N.
>
>
> --
> Hongbin Yang
>
>
> *From:* Markus Metz 
> *Date:* 2017-04-27 09:30
> *To:* RDKit Discuss 
> *Subject:* [Rdkit-discuss] Another Can't kekulize mol observation
> Hello all:
>
> I obtained this smiles string:
> c1ccc(cc1)-c1nnc(n1)-c1c1
> by removing atoms from the n1 in parentheses.
>
> Using:
> mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nnc(n1)-c1c1")
> throws an error: Can't kekulize mol.
>
> Using
> mol = Chem.MolFromSmiles("c1ccc(cc1)-c1nncn1-c1c1")
> works fine.
>
> Is there any workaround?
> Any input is highly appreciated.
>
> Cheers,
> Markus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Information contained in SMARTS and SMILES

2017-04-19 Thread Peter S. Shenkin
On Wed, Apr 19, 2017 at 7:25 PM, Andrew Dalke <da...@dalkescientific.com>
wrote:

> On Apr 19, 2017, at 23:59, Peter S. Shenkin <shen...@gmail.com> wrote:
> > One more thing. The term "Mol" in RDKit and some other tookits does not
> really mean "molecule" in the sense that chemists use it.
>
> ? I don't see how this is connected to the previous emails.
>

​The connection is that, based on the wording of the query, I thought ​that
perhaps Thilo was expecting a SMARTS to specify a molecule as chemists
understand the term.

-P.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Information contained in SMARTS and SMILES

2017-04-19 Thread Peter S. Shenkin
One more thing. The term "Mol" in RDKit and some other tookits does not
really mean "molecule" in the sense that chemists use it. It is used to
connote a data structure that can store a SMARTS or a SMILES. Only when a
SMILES is used does it really correspond to a chemical "molecule", except,
in some cases, by accident; and, as Andrew pointed out, there are cases
when exactly the same string means different things in a SMARTS and SMILES
context.

The way I think of it is that SMILES is like an ordinary string and SMARTS
is like a regex that can be used to flexibly match other strings.

-P.



On Wed, Apr 19, 2017 at 5:20 PM, Andrew Dalke 
wrote:

> On Apr 19, 2017, at 18:26, Curt Fischer  wrote:
> > From chemistry stack exchange, an answer contributed by user R.M.:
> >
> > SMARTS is deliberately designed to be a superset of SMILES. That is, any
> valid SMILES depiction should also be a valid SMARTS query, one that will
> retrieve the very structure that the SMILES string depicts.
>
> Except, that last clause isn't true. Try matching tritium against itself.
>
> >>> from rdkit import Chem
> >>> mol = Chem.MolFromSmiles("[3H]")
> >>> pat = Chem.MolFromSmarts("[3H]")
> >>> mol.HasSubstructMatch(pat)
> False
>
> For hydrogens you must use '#1', because H in SMARTS means something
> different.
>
> >>> pat2 = Chem.MolFromSmarts("[3#1]")
> >>> mol.HasSubstructMatch(pat2)
> True
>
> SMILES input under Daylight and most other toolkits gets normalized to the
> chemistry model, including aromaticity perception:
>
> >>> mol = Chem.MolFromSmiles("C1=CC=CC=C1")
> >>> pat = Chem.MolFromSmarts("C1=CC=CC=C1")
> >>> mol.HasSubstructMatch(pat)
> False
> >>> pat2 = Chem.MolFromSmarts("c1c1")
> >>> mol.HasSubstructMatch(pat2)
> True
>
> RDKit also does a small amount of additional normalization, or
> 'sanitization' to use the RDKit term. For example, it will convert "neutral
> 5 coordinate Ns with double bonds to Os to the zwitterionic form" (see
> GraphMol/MolOps.cpp):
>
> >>> s = "CN(=O)=O"
> >>> mol = Chem.MolFromSmiles(s)
> >>> pat = Chem.MolFromSmarts(s)
> >>> mol.HasSubstructMatch(pat)
> False
> >>> Chem.MolToSmiles(mol)
> 'C[N+](=O)[O-]'
>
> I believe that the output SMILES from a toolkit, assuming that the SMILES
> doesn't have an explicit hydrogen, can be used a SMARTS which will match
> the molecule made from that same SMILES, by that same toolkit.
>
> This is a weaker statement than that made by user R.M.
>
> Andrew
> da...@dalkescientific.com
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread Peter S. Shenkin
"IIRC, [Roger] gives an example of a large chemical supplier who offered
two tautomers of the same compound for sale at very different prices which
is at least embarrassing."

On the other hand, it provides a great opportunity for arbitrage. ;-)

-P.

On Tue, Apr 18, 2017 at 4:02 AM, David Cosgrove <davidacosgrov...@gmail.com>
wrote:

> Hi JW et al.,
> One of the last things I worked on before leaving AZ was what we called a
> tautomer-independent molecular representation. What we meant by this was a
> way of spotting whether a new compound being registered into the corporate
> collectin was a tautomer of one already in the database.  As part of that,
> I looked at the InChi representation and the tautomer handling which was at
> that point labelled experimental.  In our view, it was very limited in the
> types of tautomers it represented and not adequate to our needs.  As a
> result I developed a program called tt_tauts, which AZ "open-sourced" when
> they made me redundant, and is available at https://github.com/OpenEye-
> Contrib/TT_Tauts.  It's another plug for OEChem, I'm afraid, which seems
> poor form on the RDKit website, but there you go.  It is also a long way
> from being complete, and I am still working on it as a somewhat masochistic
> hobby.  Internally at CozChemIx Towers it is known as 'The Mole Project' in
> honour of the game 'Whac-A-Mole' (https://en.wikipedia.org/
> wiki/Whac-A-Mole) - every time you squash an odd tautomer case, another
> one pops up, quite often one you've already dealt with.  Chembl is a
> marvelous source of nasty test cases.  I hope to have a better version on
> github soon and also a description of the algorithm on my website.  It used
> as a jumping off point the work of Thalheim et al. (
> http://onlinelibrary.wiley.com/doi/10.1002/minf.201400128/full).
> Note that this use of tautomer enumeration/representation is somewhat
> different from that of quacpac or taut_enum. These last two are concerned
> with predicting tautomers likely to be present in water (well, blood,
> probably) at roughly neutral pH, the first is trying to deal with two
> chemists drawing the same compound in different tautomers which may look
> quite different, with the hydrogen atoms shifted a long way. Both are
> difficult and unsolved problems.  In one of Roger Sayle's papers on
> tautomers, IIRC, he gives an example of a large chemical supplier who
> offered two tautomers of the same compound for sale at very different
> prices which is at least embarrassing.
> Cheers,
> Dave
>
> On Tue, Apr 18, 2017 at 1:23 AM, JW Feng <f...@dnli.com> wrote:
>
>> Hi Maria,
>>
>> From looking at Roger's slides on https://github.com/rdkit/UGM_2
>> 016/blob/master/Presentations/Sayle_RDKitTautomers.pdf.  Is he making an
>> argument that InChi values are insufficient in generating a canonical
>> string for different tautomers?  What if you perform a set of
>> standardization transformation prior to generating InChi values?  You may
>> want to look at how Genentech normalizes molecules for compound
>> registration. The code is based on OEChem and is open sourced on Github
>> https://github.com/chemalot/chemalot.  This package is actively being
>> developed and I am a contributor.  Specifically, you'll want to look at the
>> extensive standardization transformations in
>> https://github.com/chemalot/chemalot/blob/master/src/com/gen
>> entech/struchk/oeStruchk/Struchk.xml
>>
>> The last step in Struchk.xml is creating a canonical tautomer using
>> OpenEye's QuacPac toolkit.  QuacPac returns a canonical tautomer.  Could
>> one replace this step by converting a standardized molecule to InChi and
>> the back?  Another approach is using Dave Cosgrove's TautEnum package (
>> https://github.com/OpenEye-Contrib/TautEnum).  Both QuacPac and TautEnum
>> enumerates tautomers.  I believe that Roger is intimately familiar with
>> QuacPac
>>
>> Best,
>>
>> JW
>>
>> ___
>> JW Feng, Ph.D.
>> Denali Therapeutics Inc.
>> 151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
>> 270-0628
>>
>> On Tue, Apr 11, 2017 at 6:52 AM, <rdkit-discuss-request@lists.s
>> ourceforge.net> wrote:
>>
>>> Send Rdkit-discuss mailing list submissions to
>>> rdkit-discuss@lists.sourceforge.net
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>> or, via email, send a message with subject or body 'help' to
>>> rdkit-discuss-requ...@lists.sourceforge.net
>>>

Re: [Rdkit-discuss] Check If Atom Is in Two Small Rings

2017-04-11 Thread Peter S. Shenkin
But Brian's solution won't help Jonathan find atoms that are in two
three-membered or two four-membered rings, which I thought Jonathan also
wanted, based on the wording of the original query.

-P.

On Tue, Apr 11, 2017 at 4:12 PM, Curt Fischer 
wrote:

> Brian's solution is obviously better (shorter, uses less functions) than
> mine.  (Although mine assumes that you want atoms that are part of
> _exactly_ two rings, not atoms that are part of _at least_ two rings as
> Brian's does.  Probably Brian's solution is what you want but worth noting.)
>
> CF
>
> On Tue, Apr 11, 2017 at 1:03 PM, Brian Kelley 
> wrote:
>
>> You are so close!
>>
>> >>> from rdkit import Chem
>>
>> >>> m = Chem.MolFromSmiles("C1CC12CCC2")
>>
>> >>> for atom in m.GetAtoms():
>>
>> ...   if atom.IsInRingSize(3) and atom.IsInRingSize(4): print
>> atom.GetIdx()
>>
>> ...
>>
>> 2
>>
>> >>>
>>
>> Cheers,
>>  Brian
>>
>> On Tue, Apr 11, 2017 at 1:38 PM, Jonathan Saboury 
>> wrote:
>>
>>> Hello All,
>>>
>>> I'm trying to make a function to check if a mol has an atom that is part
>>> of two small rings (3 or 4 atoms). Using GetRingInfo()/NumAtomRings() I can
>>> find out how many ring systems each atom is in, but not the details of the
>>> rings. atom.IsInRingSize(size) returns a bool so I couldn't use that. I'm
>>> using the python api.
>>>
>>> Any suggestions? Thanks!
>>>
>>> - Jonathan
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] tautomers in rdkit

2017-04-11 Thread Peter S. Shenkin
Just from the slides, it's not clear that Roger had a solution; the slides
seem to just suggest an approach. Am I missing something here?

That is, he defined the invariants that all tautomers of a compound have to
share and expressed it as a SMARTS + constraints; but I didn't see that he
provided a methodology to derive a canonical matching SMILES from a SMARTS
+ constraints. True, if two structures match the SMARTS + constraints, they
are likely tautomers. (I can't think of why they wouldn't be, but maybe
it's not always the case.) So that part provides deduplication of an input
stream, which is good, but no way to derive and store a canonical
representation.

Again, perhaps I'm missing something, but if so, what?

-P.

On Tue, Apr 11, 2017 at 2:43 AM, MARIA BRANDL 
wrote:

> Dear all,
>
>
> Is there going to be an attempt at coding Roger Sayle's  "Alternative
> Approach" to tautomers described in
> RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
> 
>  into
> RDKit ?
>
>
> I have managed to get reasonable tautomers out of Resonance.cpp using:
>
> suppl = 
> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
> | \
>   
> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
> | \
>   
> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
> | \
>   rdchem.ResonanceFlags.
> UNCONSTRAINED_ANIONS)
>
>  with some post-filtering for e.g. carbocations, but feel that it may be
> more efficient to put user defined constraints on each atom during the
> backtracking loops, as Roger suggests.
>
> Looking forward to hearing your thoughts on this.
>
> Best regards,
>
> Maria Brandl
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] NYC "RDKit Users and Learners" Meetup Monday, April 3, 7 PM at Hack Manhattan

2017-03-31 Thread Peter S. Shenkin
For more information, see:

https://www.meetup.com/RDKit-Users-and-Learners/events/237963674/?rv=ce2&_af=event&_af_eid=237963674=on

If you have RDKit-related work that you'd like to talk about or ask about,
please let me know.

-P.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] looking for feedback on new python API documentation format

2017-03-28 Thread Peter S. Shenkin
Hi, Greg,

Here are my comments.


   - Formatting
  - pdoc at a glance is certainly more handsome than epydoc
  - To my eye, there is a huge amount of wasted space in the pdoc
  documentation.
 - The line spacing is hugely disproportional to the font size
 - Maybe this could be adjusted by font and line-spacing options
 - But it's a problem because so little is shown on a page.
  - Documentation hierarchy
  - The ePydoc documentation requires you to drill down an extra level:
 - epydoc: http://rdkit.org/docs/api/rdkit.Chem.AllChem-module.html
- At the top of this link, you see all the function names
together, but you have to drill down to see the details of
any particular
function
 - pdoc: http://rdkit.org/docs_temp/Chem/AllChem.m.html
- You see each function with its full description at the module
level.
 - I personally prefer epydoc here, because I usually want to see a
 list of (functions, classes, whatever) at the top level to
figure out what
 I want to drill down to.
 - I'm just a "forest" kind of guy, and would like to pick my tree
 before I see all its gory details.
 - I accept that this might just be personal taste.
  - Code Examples
  - ePydoc shows the code examples correctly; pdoc does not.
 - epydoc: Drilling down,
 
http://rdkit.org/docs/api/rdkit.Chem.AllChem-module.html#AssignBondOrdersFromTemplate,
 consider these lines from the example code:
 - >>>  from rdkit.Chem import AllChem
 >>> template =
 AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2c2)c3c3)N")
 >>> mol = AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir,
 'Chem', 'test_data', '4DJU_lig.pdb'))
 >>> len([1 for b in template.GetBonds() if b.GetBondTypeAsDouble()
 == 1.0])
 8
 >>> len([1 for b in mol.GetBonds() if b.GetBondTypeAsDouble() ==
 1.0])
 22
 - This is very legible.
  - pdoc: On the same link we were at before,
  http://rdkit.org/docs_temp/Chem/AllChem.m.html, look at the same code
  example:
 - import os from rdkit.Chem import AllChem template =
 AllChem.MolFromSmiles("CN1C(=NC(C1=O)(c2c2)c3c3)N") mol =
 AllChem.MolFromPDBFile(os.path.join(RDConfig.RDCodeDir, 'Chem',
 'test_data', '4DJU_lig.pdb')) len([1 for b in template.GetBonds() if
 b.GetBondTypeAsDouble() == 1.0]) 8 len([1 for b in mol.GetBonds() if
 b.GetBondTypeAsDouble() == 1.0]) 22
 - Line breaks are not observed; prompts are not observed;
 responses don't appear on their new lines, etc. This is illegible.
 - (There is an additional import statement here. That's not a
 problem, but note that the second import is concatenated on
the same line.)
 - This is unacceptable, but perhaps it can be fixed.
  - Summary
  - The fact that epydoc is no longer supported weighs heavily against
  it
  - If the current examples are as good as pdoc can do, it is
  unsatisfactory, especially because of poor code printing; but there could
  be other tigers lurking in the woods.
  - I feel the wasted space in pdoc due to the huge line spacing is
  pretty bad.
  - pdoc would be worth another look if these issues can be fixed, but
  the a second look would be required, because there could be
other problems
  that are obscured by the above.
  - I like Sphynx, and it would be great if it could be made to work
  with RDKit. (With Google style docstrings!)
  - Either way, I wish the RDKit documentation included the types of
  function arguments and return values, which both Sphynx and epydoc have
  provision for.
 - I assume pdoc has provision for this, too, but if not, that's a
 big negative.
 - Adding documentation of arguments and return values would be a
 big job at this point and isn't part of the current effort;
but I feel it's
 important to pick a documentation tool that would allow this
to be done
 later.


On Wed, Mar 29, 2017 at 12:10 AM, Greg Landrum 
wrote:

> Dear all,
>
> TL;DR
> I'd like to switch to a new system for generating the RDKit Python API
> documentation and I'd like some feedback.
>
> Please take a look at this possible API documentation format:
> http://rdkit.org/docs_temp/
> and let me know if it looks like it looks as useful as the old API doc
> format:
> http://rdkit.org/docs/api/index.html
>
>
> More context:
> The current documentation (http://rdkit.org/docs/api/index.html) is
> generated using epydoc. It's functional, though quite "old school" looking.
> The problem is that epydoc is no longer supported (and hasn't been for
> quite a while) and does not support python3 at all. so I would like to move
> off of it.
>
> In 

Re: [Rdkit-discuss] delete a substructure

2017-03-10 Thread Peter S. Shenkin
Sounds like Daylight's "depictmatch", unfortunately no longer available on
line

-P.

On Fri, Mar 10, 2017 at 1:28 PM, David Cosgrove 
wrote:

> Hi,
> In the RDKit source, under the 2d drawing code in the c++ part there's the
> full source code for a QT program that will run one or more SMARTS patterns
> against a set of molecules, split any matches and non-matches into 2
> displays side by side and colour the atoms that the SMARTS match. It needs
> a bit of persistence to compile and has only been tried on Linux but is
> very helpful for writing new SMARTS. If there's interest, when I have a bit
> of spare time over the next few weeks I can make sure it's easier to
> compile. If you poke about in my website (cozchemix.co.uk) you'll find a
> link to my GitHub repo with an earlier version which has been compiled
> under Linux recently and has instructions. Sorry not to put links in, I
> don't have access to a computer st the moment, just phone.
>
> Cheers,
> Dave
>
> On Thu, 9 Mar 2017 at 18:41, Chenyang Shi  wrote:
>
>> Thank you Chris. I found that one too; it is quite convenient to
>> visualize both SMARTS and SMILES strings.
>>
>> On Thu, Mar 9, 2017 at 11:28 AM, Chris Swain  wrote:
>>
>> I use SMARTSviewer at Univ of Hamburg
>>
>> http://www.zbh.uni-hamburg.de/en/bioinformatics-server.html
>>
>> Chris
>>
>> On 9 Mar 2017, at 17:21, rdkit-discuss-requ...@lists.sourceforge.net
>> wrote:
>>
>> One last question I have is do you guys have convenient online or local
>> documents to look up desired SMARTS.
>> Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes
>> with the installation of RDKIT.
>> Brian suggested daylight website,
>> http://www.daylight.com/dayhtml_tutorials/languages/
>> smarts/smarts_examples.html, which is a good place as well.
>>
>> Best,
>> Chenyang
>>
>>
>>
>> 
>> --
>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>> dictionary content that is easy and intuitive to access. Sign up for an
>> account today to start using our lexical data to power your apps and
>> projects. Get started today and enter our developer competition.
>> http://sdm.link/oxford
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>> 
>> --
>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>> dictionary content that is easy and intuitive to access. Sign up for an
>> account today to start using our lexical data to power your apps and
>> projects. Get started today and enter our developer competition.
>> http://sdm.link/oxford___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
> 
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question about WedgeMolBonds

2017-02-26 Thread Peter S. Shenkin
(or by means of an optional argument :-) )

-P.
Sent from a cell phone. Please forgive brvty and m1St@kes.

On Feb 26, 2017 00:36, "Greg Landrum"  wrote:

>
>
> On Sat, Feb 25, 2017 at 7:23 PM, John Mayfield <
> john.wilkinson...@gmail.com> wrote:
>
>> Is there something that the compute2DCoords() is doing that makes it a
>>> dependency for WedgeMolBonds()
>>
>>
>> Yes, calculating 2D coordinates. Look at these two molecules, they are
>> the same but the atoms have been positioned differently in 2D and hence the
>> wedging needs to be different. Therefore you need 2D coordinates before you
>> can (re)assign wedges.
>> [image: Inline images 3]
>>
>
> John is exactly right here.
>
>
>> In truth since the two (2D layout and wedging) are dependant I'd probably
>> make the layout call the wedging automatically
>>
>
> Not a bad idea, but the RDKit doesn't do it since you don't always need
> the bond wedging information.
>
> -greg
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Peter S. Shenkin
With Glide, IIRC, this facility is designed for the use case where the
coordinates of a docked ligand are known (typically from an X-ray
structure) and the docked ligand shares a SMARTS with the ligands in an
input file. The SMARTS-matching atoms of each incoming ligand are
superposed upon the corresponding atoms of the docked ligand and the
resulting pose is used as an initial guess for the docking.

Some notes:

0. Greg questions whether there is really a common core in your example,
and if there's not, it doesn't appear as if the procedure is directly
applicable. But if it is applicable, read on.

1. If the SMARTS matches in multiple ways, all are tried, and the best
docking score among them wins (though there may be a way of requesting the
N best scores, or even all of them). So if the SMARTS specifies a phenyl
ring, for example, 12 initial poses will be tried. (If it contains two
phenyl rings, 24 will be tried)

2. GLIDE itself does conformer generation, but I'm not sure how it works in
this procedure. If the SMARTS specifies a rigid core, you probably don't
need to pre-generate conformers, but if the core is flexible, you are
probably best off generating them, which of course you are permitted to do.

3. If you have GLIDE, then  you probably have LigPrep as well. The
advantage of using LigPrep for your conformation generation would be that
the strain energy would be written into the output file, and then, when
used as the input to Glide, it would be taken into account when computing
the docking score. And it uses the same (or a very similar) force-field
that Glide itself uses.

4. I may have some details slightly incorrect, so you might want to address
your question to Schrödinger tech support.

On Mon, Feb 20, 2017 at 2:15 PM, Brian Kelley  wrote:

> I don't know the exact glide procedure, but I did write such a system for
> OpenEye (POSIT).  The issue you are facing is that the RMSD portion is just
> a constraint used for docking, it isn't used as the "score", in fact, it
> can't tell if the conformation interpenetrates the active site or which
> orientation is better.
>
> I believe RDKit can generate conformations with a template, see
> AllChem.ConstrainedEmbed, this would solve half of your problem in creating
> conformations that match your template.  You still have the problem with
> scoring against your active site.  POSIT scored against the shape tanimoto
> of the active ligands (if any) to try to fill the same space as the known
> ligands. See rdkit.Chem.rdShapeHelpers.ShapeTanimotoDist
>
> This might not be what you want, but we had good success with similar
> methods and virtual screening, especially when using multiple co-crystal
> active sites.   I can send you a reference link if this interests you
>
> Cheers,
>  Brian
>
> On Mon, Feb 20, 2017 at 12:17 PM, Thomas Evangelidis 
> wrote:
>
>> ​
>> Greg and Brian,
>>
>> Thank you for your useful hints. All the compounds that I want to align
>> are supposed to belong to the same analogue series so they should shave a
>> common substructure with substantial size.
>>
>> What I want to emulate is the "core restrained docking" with glide, where
>> you specify the common core of the query and the reference ligand using a
>> SMARTS pattern and then glide docks the query compound to the binding
>> pocket but takes care to overlay the core atoms of the query to the core
>> atoms of the reference compound. Since RDKit does not do docking, I just
>> generate 30 conformers of each query compound and select the best one by
>> measuring the RMSD between the core of the query and the core of the
>> reference after the alignment. Of course the conformations of the core
>> atoms between the query and the reference are never identical hence the bad
>> alignment. Is there any smarter way to emulate the "core restrained
>> docking" with RDKit?
>>
>> I will provide you with more info soon (example sdf, results, etc.).
>>
>>
>> ​
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] UFF and MMFF conformers energy

2017-02-09 Thread Peter S. Shenkin
Small atomic displacements can cause large forcefield energy differences.
Computing molecular-mechanics energies from exactly the same coordinates
using two different force-fields is probably not a reasonable procedure.

It would be better to do an energy minimization with the two force fields
separately and live with the differing coordinates. Often the coordinate
differences will be small, but there are situations in which forcefield X
gives no conformation that is close to a minimized conformation obtained
from forcefield Y.

-P.

On Thu, Feb 9, 2017 at 11:48 AM, Méliné Simsir 
wrote:

> Dear all,
>
> I'm still a beginner with Rdkit, and I’m having a hard time calculating
> the energy of conformers.
> I think I have read all the topics on the subject, but I might have missed
> something.
> Here is a shorten version of a code I wrote thanks to all the information
> I could find here.
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
> from rdkit.Chem import rdmolops
>
> mols = Chem.SDMolSupplier('test_3c.sdf')
>
> for mol in mols:
> # UFF
> ffu = AllChem.UFFGetMoleculeForceField(mol)
> energy_value_U = ffu.CalcEnergy()
> print(energy_value_U)
> # MMFF
> mol = rdmolops.AddHs(mol, addCoords = True)
> mp = AllChem.MMFFGetMoleculeProperties(mol)
> ffm = AllChem.MMFFGetMoleculeForceField(mol, mp)
> energy_value_M = ffm.CalcEnergy()
> print(energy_value_M)
>
>
> My problem is that I get weird energy values (not all the time I think,
> but too often). Either too high or too low. And i don't get why.
> I would like to get the energy of my ligand as he is. Without changing
> it's conformation at all. That's why I don't use the minimization function,
> or optimization. First I also didn't add the hydrogen, but thanks to one of
> Paolo Tosco's code (mmffEnergyTerms.py), I saw that H are not included even
> if they are in the sdf file. But, I still have some weird values i think.
>
> Here three examples:
> GNP_1CIP : -305.2896 kcal/mol (MMFF, with H added)
> -46.4812 kcal/mol (MMFF, without H add)
> 278.6606 kcal/mol (UFF)
>
> SAM_2IGT: 1675.9951 (MMFF, with H added)
> 151.6481 (MMFF, without H add)
> 217.1217 (UFF)
>
> ACP_3A1C: 222.9396 (MMFF, with H added)
> 269.151 (MMFF, without H add)
> 217.1217 (UFF)
>
> Regards,
> Méliné
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit "cannot create mol from SMILE" error

2017-01-18 Thread Peter S. Shenkin
In addition to Brian's observation, there is also a "C1" early in the
SMILES, but no corresponding X1 to make a ring bond before or after it.

It appears that you might be reading the second half of a SMILES for some
reason. My guess is that the (C=C1) is associated with a preceding atom
that was not read.

-P

On Wed, Jan 18, 2017 at 6:32 PM, Brian Kelley  wrote:

> That doesn't look like a valid SMILES to me, I don't think a think a
> smiles string can start with a parenthesis ( branch ).
>
> 
> Brian Kelley
>
> On Jan 18, 2017, at 6:18 PM, Larson Danes  wrote:
>
> Hi all,
>
> I'm using the following query in postgresql (with the rdkit extension
> installed):
>
> "select casrn from mols where m @> CAST(? AS mol)"
>
>
> This returns "ERROR: could not create molecule from SMILES '...' " on 
> occasion. One such SMILE that causes this error regularly is 
> '(C=C1)[N+]([O-])=O'. I'm curious if there's documentation on this specific 
> error message anywhere. I've looked and haven't had luck finding any.
>
> Any information about this error message is much appreciated.
>
>
> Thanks,
>
>
> Larson
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Peter S. Shenkin
You say "most stable", but I think you mean "most common." 2H is as stable
as 1H, but less common.

-P.

On Wed, Jan 18, 2017 at 5:01 PM, Milinda Samaraweera <
milindaatw...@gmail.com> wrote:

> Hi Bob,
>
> I am trying to filter out any compound that does not have the most stable
> isotopic form;  (anything other than: 12C,1H,14N,16O, 31P, 32S) or to
> contain only MonoIsotopic compounds.
>
> Thanks,
> Milinda
> ​
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Peter S. Shenkin
How about a regex filter on the all-atom SMILES?

-P.

On Wed, Jan 18, 2017 at 9:56 AM, Milinda Samaraweera <
milindaatw...@gmail.com> wrote:

> Dear Experts,
>
> I am trying to figure out a way to exclude entries which contain heavy
> atoms (13C, 2H, 3H, etc), from a SD file (which has close to two thousand
> entries) and write an updated file with the remaining entries.
>
> I do understand how to read/write SD files using rdkit.
>
> What I do understand is how to detect entries with heavy isotopes: Is
> there an efficient and correct way of achieving this using rdkit?
>
> thanks,
> --
> Milinda Samaraweera
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
Dimitri,

Just for the record, you responded directly to a quote of mine.

Now you say that your objections were to using numbers that appeared in a
different quote by somebody else.

Personally, I think those numbers are indeed applicable. Nobody would think
of doing this on a single CPU, so I'm not sure why you think somebody was
suggesting that.

But either way, I move that we end this thread. The issues and possible
solutions are out on the table, and all of us can now, as they say, "pay
our money and take our choice."

Best,
-P.

On Dec 29, 2016 5:06 PM, "Dimitri Maziuk" <dmaz...@bmrb.wisc.edu> wrote:

> On 12/29/2016 02:35 PM, Peter S. Shenkin wrote:
> > Dimitri,
> >
> > You were the one who suggested that all the structural depictions be
> > generated.
> >
> > I, in contrast, suggested that only the ones users need to look at need
> be
> > generated. I further suggested that these would only constitute a small
> > fraction of those in a large DB.
>
> My objection was to using numbers like
>
> > ... for 92877507
> > structures (current size PubChem Compound):
> > 1s per structure = 1074 days (~3 years)
> > 100 ms per structure = 107 days
> > 1ms per structure = 25 hours
>
> as if they actually mean something.
>
> I responded that *if* the requirement is to generate all 100M
> depictions, making the code faster on a single CPU core is rarely the
> cost-effective solution. That was a purely academic "if" because I don't
> believe that regenerating all the depictions at once on a regular basis
> is a realistic use case, either.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
Dimitri,

You were the one who suggested that all the structural depictions be
generated.

I, in contrast, suggested that only the ones users need to look at need be
generated. I further suggested that these would only constitute a small
fraction of those in a large DB.

-P.

On Thu, Dec 29, 2016 at 2:49 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 12/29/2016 12:43 PM, Peter S. Shenkin wrote:
>
> > Of the
> > billion structures, only a fraction will ever be visualized, so a
> > memoization strategy sounds reasonable, which in turn implies that you
> want
> > rapid response when an unstored structure has to be generated.
>
> :)
>
> Now I have a mental picture of a phd student tied to a chair with his
> eyes taped open, forced to look at a billion depictions for 10ms each.
>
> Pictures are only useful if you have a human looking at them. Looking is
> only useful if you do it long enough for the brain to process it. The
> whole "what if we need a billion depictions all at once" implies that
> you have a billion users looking at them all at once. If you don't, then
> rapid response is a very interesting academic exercise but its practical
> usefulness might be somewhat questionable.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
Look, it all boils down to (CPU) time, and time is money. Generating a
billion depictions on the cloud will cost you the use of the machines.
Increasing the depiction speed by a factor of 10 decreases the cost by a
factor of 10, to a pretty good approximation. Storage is also money, so it
doesn't always make sense to store all N structures up front, if N is
large. In some contexts, it makes more sense to generate the 2d reps as
needed, rather than store them all in advance. One size doesn't fit all.

An intermediate strategy would be to generate the depictions on the fly and
memoize them for some time or up to some maximum storage limit. Of the
billion structures, only a fraction will ever be visualized, so a
memoization strategy sounds reasonable, which in turn implies that you want
rapid response when an unstored structure has to be generated.

-P.

On Thu, Dec 29, 2016 at 12:04 PM, Dimitri Maziuk 
wrote:

> On 2016-12-29 07:19, John M wrote:
>
> > For why you need sub-second depiction consider these times for 92877507
> > structures (current size PubChem Compound):
> >
> > 1s per structure = 1074 days (~3 years)
> > 100 ms per structure = 107 days
> > 1ms per structure = 25 hours
>
> The Dilbert answer is buy a better computer. The serious answer is if
> you run millions of jobs sequentially on a single core, your problem is
> not how long a single job takes: no matter how fast you can make it, it
> will only scale linearly. There will be 1B compounds in PubChem two
> years from now and your painstakingly crafted 1ms/structure code will
> still take 3 years, the only difference is you get garbage depictions.
>
> Condor can be persuaded fire up 92877507 EC2 VMs and run all of those in
> parallel -- provided you're willing to pay Amazon for it of course. If
> you can code the algorithm into GPGPU/SIMD parallel flow, you can
> probably push it into an FPGA and then get that baked into ASICs in
> China -- they'll give you discount if you order more than ten thousand.
> That gets you a $20 USB dongle that will run them at umpteen K/second.
> And so on.
>
> If you don't want quality depictions because bad ones will work just
> fine for your needs, that's a perfectly good argument. If you don't want
> them because generating 10M sequentially on a single core will take a
> long time, that's BS argument.
>
> Dima
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
As a thought, it might make sense to consider a distinction between
publication-quality images and "pretty good" images. The latter require
speed and clarity, whereas a number of additional niceties (I hate to use
the word "elegance") would be highly desirable for the former, even at the
expense of speed. For example, for publication-quality images, one might
try to adhere more closely to the IUPAC recommendations for 2D depictions.

-P.



On Thu, Dec 29, 2016 at 8:53 AM, Brian Kelley <fustiga...@gmail.com> wrote:

> Perhaps we could train a ML algorithm to know which algorithm to use when
> :)
>
> Cheers,
>  Brian
>
> On Thu, Dec 29, 2016 at 8:19 AM, John M <john.wilkinson...@gmail.com>
> wrote:
>
>> Hi Peter,
>>
>> I uploaded the benchmark set here: https://github.com/johnm
>> ay/layout-benchmark and have tested on their web service a few weeks
>> ago. IIRC it did seem quite slow, maybe fine for ahead of time generation
>> but not usable for on demand depiction. It does produce very nice
>> depictions but I think the right way to go is described by Alex Clark (2006
>> I think?) and used by MOE. Essentially use optimisation for certain
>> parts/classes of structure but not everything.
>>
>> Unfortunately no comparison to MOE/ChemDraw in the paper.
>>
>> For why you need sub-second depiction consider these times for 92877507
>> structures (current size PubChem Compound):
>>
>> 1s per structure = 1074 days (~3 years)
>> 100 ms per structure = 107 days
>> 1ms per structure = 25 hours
>>
>> John
>>
>> On 15 December 2016 at 23:12, Peter S. Shenkin <shen...@gmail.com> wrote:
>>
>>> Yes, of course, storing the images is an alternative.
>>>
>>> -P.
>>>
>>> On Thu, Dec 15, 2016 at 5:46 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
>>> wrote:
>>>
>>>> On 12/15/2016 04:23 PM, Peter S. Shenkin wrote:
>>>>
>>>> > Obviously, it doesn't matter if you're rendering just few structures,
>>>> but
>>>> > in a scenario where you might be downloading a hundred SMILES from a
>>>> DB and
>>>> > displaying them on a grid in a browser, computing the 2D depictions
>>>> on the
>>>> > fly, waiting 5 sec for a page refresh wouldn't be great.
>>>>
>>>> Maybe not, but depending how the browser lays out the grid, it may take
>>>> 5 seconds anyway.
>>>>
>>>> My recommendation for that use case would be to pre-generate the images
>>>> and store the URLs in that database. Which is what we do here.
>>>>
>>>> ;)
>>>> --
>>>> Dimitri Maziuk
>>>> Programmer/sysadmin
>>>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>>>>
>>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Bug in AllChem.EmbedMultipleConfs pruning?

2016-12-22 Thread Peter S. Shenkin
Tri-anything groups can be considered one by one after the remaining heavy
atoms have been aligned. This turns a combinatorial explosion into a linear
algorithm for these groups. (Well, it would be linear in number of
tri-anything groups, but it gets more complicated if the anythings are more
than monatomic.)

This would matter from the point of view of RMSD if binding conformations
were being compared to each other or to free molecules, or when free
molecules were being compared to each other in a situation where steric
hindrance affects some tri-something group differently among different
conformations. Considering the tri-anything groups would factor in
significant deviations from the local equilibrium geometry

It would also matter in atomic mappings if you really wanted to know which
hydrogen (or anything else) in a conformer alligns to a particular
tri-anything bond in a reference structure. For example, you might have a
methyl group in the reference structure where one H points into a pocket in
an active site and, in a series of analogs, you want to try substituting
the corresponding H some R group or groups.

-P.

On Dec 22, 2016 11:38 AM, "Brian Cole"  wrote:

> RMSD with auto-morph symmetries with hydrogens are crazy expensive to
> calculate. Symmetry should be on by default, but without hydrogens. Would
> even love to see the RMSD auto-morph symmetry code ignore trifluro type of
> groups too as they dramatically increase the cost of the computation with
> little added value.
>
> On Thu, Dec 22, 2016 at 10:27 AM, Greg Landrum 
> wrote:
>
>>
>> On Thu, Dec 22, 2016 at 4:06 PM, JW Feng  wrote:
>>
>>>
>>> Thanks for confirming the bug.  I also vote for changing the code to use
>>> only heavy atoms.  Is symmetry taken into consideration when calculating
>>> RMS during the pruning step?
>>>
>>
>> Symmetry is not taken into account, once the code to do that is available
>> in C++ (Peter Gedeck is working on this), we'll add that option too.
>>
>> -greg
>>
>>
>>
>> 
>> --
>> Developer Access Program for Intel Xeon Phi Processors
>> Access to Intel Xeon Phi processor-based developer platforms.
>> With one year of Intel Parallel Studio XE.
>> Training and support from Colfax.
>> Order your platform today.http://sdm.link/intel
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/intel
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Peter S. Shenkin
Yes, of course, storing the images is an alternative.

-P.

On Thu, Dec 15, 2016 at 5:46 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 12/15/2016 04:23 PM, Peter S. Shenkin wrote:
>
> > Obviously, it doesn't matter if you're rendering just few structures, but
> > in a scenario where you might be downloading a hundred SMILES from a DB
> and
> > displaying them on a grid in a browser, computing the 2D depictions on
> the
> > fly, waiting 5 sec for a page refresh wouldn't be great.
>
> Maybe not, but depending how the browser lays out the grid, it may take
> 5 seconds anyway.
>
> My recommendation for that use case would be to pre-generate the images
> and store the URLs in that database. Which is what we do here.
>
> ;)
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Peter S. Shenkin
Well, Figure 10 shows that a molecule with about 25 heavy atoms takes about
50 ms to optimize.

In John Mayfield's UGM talk, it looks like CDK is taking an average of 1 ms
for "easy" structures and 56 ms for the hard ones, some of which are
depicted and have far more than 25 heavy atoms.

We don't know the details of the two data sets, so a head-to-head
comparison is tough, but intuitively, 20 structures/sec sounds slow.

Having said that, it's reasonable to pay a price in speed for additional
quality and robustness.

Obviously, it doesn't matter if you're rendering just few structures, but
in a scenario where you might be downloading a hundred SMILES from a DB and
displaying them on a grid in a browser, computing the 2D depictions on the
fly, waiting 5 sec for a page refresh wouldn't be great.

-P.

On Thu, Dec 15, 2016 at 4:22 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 12/15/2016 02:53 PM, Peter S. Shenkin wrote:
> > Looks good, but maybe too slow for production use... (?)
>
> I wonder what kind of production use would require sub-second wall clock
> time for this.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Peter S. Shenkin
Looks good, but maybe too slow for production use... (?)

-P.

On Thu, Dec 15, 2016 at 3:38 PM, Chris Swain  wrote:

> At first glance this looks an interesting approach.
>
> Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram
> Generation of Complex Molecules and Ligand–Protein Interactions
> DOI: http://dx.doi.org/10.1021/acs.jcim.6b00391
>
> On 27 Sep 2016, at 05:38, rdkit-discuss-requ...@lists.sourceforge.net
> wrote:
>
> 2D drawing code is tough. The 90/10 rule applies: the last 10% of
> correctness takes 90% of the effort.
>
> I like Dmitri Agrafiotis's method, but IIRC it's patented; also, though
> it's good for rough work, it doesn't produce "beautiful" structural
> diagrams.
>
> Some of the 2D drawing methods that do produce "pretty" pictures have a
> large number of templates built in that match the most common (and even
> somewhat uncommon) motifs, and they fall down when they hit something they
> can't get a close enough match for. And then, the IUPAC has a whole list of
> "desirable" features in 2D diagrams (as in, "Don't show it this way, but
> rather show it that way."). So even if you produce what might appear to be
> an acceptable drawing, it might not match the IUPAC list of desirables.
>
> I think for the present purposes what we need is something correct, robust
> and legible, and of course the example shown does not exhibit that. (But I
> don't know what the starting SMILES is, so I don't know whether the
> 7-bonded C is due to a bad SMILES, in which case all bets are off.)
>
> In addition, I think some discussion earlier indicated that the RDKit 2D
> structures look much worse when H's are included.
>
> I actually wrote a code one time (while at Schr?dinger) to give a "badness"
> score to 2D structures. When our 2D depiction development was in progress,
> we created 2D SD files for many thousands of structures. I could put these
> through the program and sort with the worst on top. That allowed the most
> severe problems to be identified more quickly than, say, looking at
> thousands of 2D diagrams. The program looked at three things: Number of
> bonds that crossed, Number of atoms that were too close together, and Large
> disparity of bond lengths within the same molecule. (The checking code
> didn't deal with labels.)
>
> Writing the checker was a fun project, but I'm glad I didn't have to write
> the 2D depiction code. As Mark Twain said, "Improving oneself is good.
> Improving others is better ? and easier."
>
> -P.
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)

2016-11-18 Thread Peter S. Shenkin
That's a really nice presentation.

-P.

On Fri, Nov 18, 2016 at 3:16 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> This is a very big topic, and one where I would very much like to improve
> the RDKit. John Mayfield gave a great talk on the issues (and some ideas
> about fixing them based on his work with the CDK) at the UGM that some of
> you may find interesting :
> https://github.com/rdkit/UGM_2016/blob/master/Presentations/JohnMayfield_
> Depiction.pdf
>
> Fixing the larger problems is a *lot* of work and not something that is
> likely to happen quickly, but there is some low-hanging fruit (like cutting
> crossed bonds) that I ought to be able to do something about.[1]
>
> -greg
> [1] the trick is to avoid, as much as possible, creating drawings that
> look like Möbius strips.
> _____
> From: Peter S. Shenkin <shen...@gmail.com>
> Sent: Thursday, November 17, 2016 11:23 PM
> Subject: Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a
> bit off-topic)
> To: <rdkit-discuss@lists.sourceforge.net>
>
>
>
>
> On 17 Nov 2016, at 4:12 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote:
>
> Philosophically speaking, there must exist molecules for which a legible
> 2D projection is simply not possible.
>
>
> Hi,
>
> I don't think that 2D projection of a 3D structure is an appropriate
> paradigm for 2D depiction, in general. I think of it as being more about 2D
> construction. I don't think camphor is a particularly difficult example,
> though, and I think that the hidden-line elimination (for lack of a better
> term) that Marvin does gives it a leg up on RDKit's representation.
>
> By the way, I do not think that Marvin is the best there is out there;
> it's just what I happen to have available for comparison.
>
> Stereochemistry adds complications, because 3D information has to be
> encoded in some way. Camphor (your suggestion) has a little of this. I gave
> Marvin a non-stereo SMILES and it picked an enantiomer. I drew the same
> enantiomer. I did not specify stereochemistry to RDKit, so, despite the
> visual confusion of the bond crossings, I suppose it's good that it didn't
> depict an explicit enantiomer.
>
> And labels add further complications. The two approaches I've seen for
> labels are using them as the atomic vertices, as RDKit does, and adding
> them adjacent to the vertices. I personally prefer the latter, because to
> my eye, it's easier to see the connectivity without being distracted by the
> labels.
>
> But my philosophical point was that different forms of 2D depiction work
> better for different purposes. Stéphane wants to see sugars drawn as
> carbohydrate chemists are used to seeing them. I would like to see the 2D
> connectivity as clearly as possible and would sacrifice some conventions
> for that purpose. And so on.
>
> -P.
>
>
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)

2016-11-17 Thread Peter S. Shenkin

> On 17 Nov 2016, at 4:12 PM, Dimitri Maziuk  wrote:
> 
> Philosophically speaking, there must exist molecules for which a legible
> 2D projection is simply not possible.

Hi,

I don't think that 2D projection of a 3D structure is an appropriate paradigm 
for 2D depiction, in general. I think of it as being more about 2D 
construction. I don't think camphor is a particularly difficult example, 
though, and I think that the hidden-line elimination (for lack of a better 
term) that Marvin does gives it a leg up on RDKit's representation. 

By the way, I do not think that Marvin is the best there is out there; it's 
just what I happen to have available for comparison.

Stereochemistry adds complications, because 3D information has to be encoded in 
some way. Camphor (your suggestion) has a little of this. I gave Marvin a 
non-stereo SMILES and it picked an enantiomer. I drew the same enantiomer. I 
did not specify stereochemistry to RDKit, so, despite the visual confusion of 
the bond crossings, I suppose it's good that it didn't depict an explicit 
enantiomer.

And labels add further complications. The two approaches I've seen for labels 
are using them as the atomic vertices, as RDKit does, and adding them adjacent 
to the vertices. I personally prefer the latter, because to my eye, it's easier 
to see the connectivity without being distracted by the labels.

But my philosophical point was that different forms of 2D depiction work better 
for different purposes. Stéphane wants to see sugars drawn as carbohydrate 
chemists are used to seeing them. I would like to see the 2D connectivity as 
clearly as possible and would sacrifice some conventions for that purpose. And 
so on.

-P.

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-27 Thread Peter S. Shenkin
It would seem that a major issue with RDKit's multiconformer file is the
inability to associate structure-level and atom-level properties with
conformations. t's not quite orthogonal to the question of how to read,
say, a multiconformer SD file into RDKit's multiconformer format, because
the conformers in said SD file could contain such properties, and
information would be lost.

-P.

On Thu, Oct 27, 2016 at 6:20 AM, Thomas Evangelidis 
wrote:

> Hello Greg,
>
> Is the canonical SMILES string always unique for every isomer and
> tautomerization state of a molecule? If yes, then I have already written a
> function to load multiple molecules and their conformers, which I can share
> it here.
>
> best
> Thomas
>
> PS: thanks to David for pointing this out.
>
>
>
> On 27 October 2016 at 05:20, Greg Landrum  wrote:
>
>> Hi Thomas,
>>
>> You're right, reading multiple conformations out of an SDF does seem like
>> one of those common operations. Unfortunately the RDKit does not currently
>> support it in an easy way.
>>
>> A python implementation of this would be a good topic for Friday's UGM
>> hackathon, we can see if anyone finds it interesting enough to work on.
>>
>> -greg
>>
>>
>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I am a new user of RDkit and I was looking in the documentation for an
>>> easy way to load multiple conformers from a structure file like .sdf. The
>>> code must 1) distinguish between different protonation states of the same
>>> molecule,  2) create a new Mol() object for each protonation state and load
>>> into it the respective conformers.
>>>
>>> Apparently I can work out a solution for 1)
>>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other
>>> properties, but I was wondering if there is any more straight forward way
>>> to do it.
>>> For 2) I guess I must iterate over all molecules in the input file,
>>> create new Mol() objects (one for each protonation state of each ligand)
>>> and add conformers to these new Mol() objects. Again this sounds easily
>>> programmable, but sounds like a very common operation, thus I was wondering
>>> if it has been implemented in a function.
>>>
>>> thanks in advance
>>> Thomas
>>>
>>>
>>> --
>>>
>>> ==
>>>
>>> Thomas Evangelidis
>>>
>>> Research Specialist
>>> CEITEC - Central European Institute of Technology
>>> Masaryk University
>>> Kamenice 5/A35/1S081,
>>> 62500 Brno, Czech Republic
>>>
>>> email: tev...@pharm.uoa.gr
>>>
>>>   teva...@gmail.com
>>>
>>>
>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>> 
>>> --
>>> The Command Line: Reinvented for Modern Developers
>>> Did the resurgence of CLI tooling catch you by surprise?
>>> Reconnect with the command line and become more productive.
>>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>>> http://sdm.link/telerik
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)

2016-10-26 Thread Peter S. Shenkin
Hey, by the way, my agenda is trying to understand all this. I'm ignorant
about the general area and have learned something. But don't worry -- not
enough to be dangerous. :-) If something comes out of the discussion that's
generally useful, great!

By the way, when you post your UGM Jupyter notebook on github, could you
post the URL to the list? As I mentioned at the Cambridge UGM, that talk
was the best introduction to RDKit that I've seen, and I think many will
find it useful.

-P.
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)

2016-10-25 Thread Peter S. Shenkin
Indeed, when the file under discussion most recently named "svg2.html" is
modified so that "xmlns:svg=" is replaced with "xmlns=", and the file is
renamed "svg2.svg", double-clicking it opens it and correctly correctly
displays the image in the browser.

But trying this in the Jupyter notebook fails. the original code had the
lines:

svg = drawer.GetDrawingText().replace('svg:','')
display(SVG(svg))

This succeeded. If i add Dimitri's latest sugesstion:

svg =
drawer.GetDrawingText().replace('svg:','').replace('xmlns:svg=','xmlns=')
display(SVG(svg))

this also succeeds. If I only carry out the second replacement, this fails
with an error several levels down.

So apparently, SVG() can create an svg object out of the contents of a
correctly formed svg file, but is insensitive to some constructs that make
the such a file invalid for direct use in a browser.

I'm still not sure why GetDrawingText() doesn't return a properly formatted
svg string. Is there some use its output can be put to without these
.replacements?

-P

On Tue, Oct 25, 2016 at 1:35 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 10/25/2016 11:21 AM, Peter S. Shenkin wrote:
> > Hi, Hongbin,
> >
> > Thanks. Indeed. svg2.svg, when renamed to svg2.html, shows the correct
> > image in Chrome. svg.html shows garbage.
> >
> > Still, it would be good to be able to create a real .svg file from RDKit.
>
> OK, you made me look and I learned something today.
>
> Mozilla claims valid SVG must include the namespace declarations
> (https://developer.mozilla.org/en-US/docs/Web/SVG/FAQ) citing this
> document: https://jwatt.org/svg/authoring/#namespace-binding
>
> There it states
> """
> http://www.w3.org/2000/svg;
> ...
> Be careful not to type xmlns:svg instead of just xmlns when you bind the
> SVG namespace. This is an easy mistake to make, but one that can break
> everything. Instead of making SVG the default namespace, it binds it to
> the namespace prefix 'svg', and this is almost certainly not what you
> want to do in an SVG file. A standards compliant browser will then fail
> to recognise any tags and attributes that don't have an explicit
> namespace prefix (probably most if not all of them) and fail to render
> your document as SVG.
> """
>
> Sure enough, rdkit's files start with
> """
>xmlns:svg='http://www.w3.org/2000/svg'
> ...
> """
>
> With that declaration any standards-compliant viewer should only
> recognize tags with "svg:" prefix, and removing svg:'s results in a
> technically invalid file. Anything that displays it as an image is what
> we "it professionals" call b0rk3d.
>
> According to this, what RDKit writes out is wrong: you actually *want
> to* remove :svg from the root tag's "xmlns" attribute, then you *may*
> remove the svg: prefixes from all tags (including the root one).
>
> Of course, that was last edited in 2007, maybe something changed in the
> 10 years since.
>
> HTH,
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index

2016-10-25 Thread Peter S. Shenkin
Hi, Hongbin,

Thanks. Indeed. svg2.svg, when renamed to svg2.html, shows the correct
image in Chrome. svg.html shows garbage.

Still, it would be good to be able to create a real .svg file from RDKit.

Best,
-P.

On Tue, Oct 25, 2016 at 2:35 AM, 杨弘宾 <yanyangh...@163.com> wrote:

> Hi, Peter,
> I don't know whether it can help you since I did not repeat your code.
> But it acturally works in my computer:
> change the extended name from .svg into .html and open it via
> chrome.
> It should be valid with svg2.svg (the namespace of svg were removed).
>
> --
> Hongbin Yang
>
>
> *From:* Peter S. Shenkin <shen...@gmail.com>
> *Date:* 2016-10-25 13:27
> *To:* Dmitri Maziuk <dmitri.maz...@gmail.com>
> *CC:* RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> *Subject:* Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index
> Hi,
>
> Dima wrote:
>>
>> Try saving the text (svg/svg2) to a file and opening it in chrome (if you
>> can actually open a file in chrome) or some other application.
>
>
> I actually did that, and in a second email I reported:
>
>>
>>- Chrome thinks svg.svg is empty
>>
>>
>>- When I load svg2.svg, Chrome complains, "This XML file does not
>>appear to have any style information associated with it. The document tree
>>is shown below"
>>
>>  -P.
>
> On Tue, Oct 25, 2016 at 12:28 AM, Dmitri Maziuk <dmitri.maz...@gmail.com>
> wrote:
>
>> OK, my turn: that went out too soon.
>>
>> It seems to me that jypiter, ipython, or whatever, has no idea how render
>> MIME type image/svg+xml. It can display an "SVG" object, but the bit that
>> turned image/svg+xml into "SVG" does not understand XML namespaces (that's
>> been around since at least 2009).
>>
>> Try saving the text (svg/svg2) to a file and opening it in chrome (if you
>> can actually open a file in chrome) or some other application.
>>
>> Dima
>>
>>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index

2016-10-24 Thread Peter S. Shenkin
Hi,

Dima wrote:
>
> Try saving the text (svg/svg2) to a file and opening it in chrome (if you
> can actually open a file in chrome) or some other application.


I actually did that, and in a second email I reported:

>
>- Chrome thinks svg.svg is empty
>
>
>- When I load svg2.svg, Chrome complains, "This XML file does not
>appear to have any style information associated with it. The document tree
>is shown below"
>
>  -P.

On Tue, Oct 25, 2016 at 12:28 AM, Dmitri Maziuk 
wrote:

> OK, my turn: that went out too soon.
>
> It seems to me that jypiter, ipython, or whatever, has no idea how render
> MIME type image/svg+xml. It can display an "SVG" object, but the bit that
> turned image/svg+xml into "SVG" does not understand XML namespaces (that's
> been around since at least 2009).
>
> Try saving the text (svg/svg2) to a file and opening it in chrome (if you
> can actually open a file in chrome) or some other application.
>
> Dima
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-24 Thread Peter S. Shenkin
Consider the following excerpt:

svg = drawer.GetDrawingText()
svg2 = svg.replace('svg:','')
svg3 = SVG(svg2)
print 'displaying svg:'
display(svg)
print 'displaying svg2:'
display(svg2)
print 'displaying svg3:'
display(svg3)

svg and svg2 display as xml text. svg3 displays as the image, in a Jupyter
notebook in Chrome.

On Mon, Oct 24, 2016 at 6:44 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 10/24/2016 04:39 PM, Peter S. Shenkin wrote:
>
> > Or is it
> > rather because chemists in your target audience will be thinking of the
> > first atom in, say, a structure from an sd file as atom #1?
>
> That
>
> > 2. Regarding the last line, most of the RDKit code I've seen in past
> > examples displays the molecule using code like the following. When is it
> > necessary/not necessary to remove the "svg" string from the results of
> > GetDrawingText()?
>
> Not sure: it's a namespace, I'm assuming ipython can't deal with xml
> namespaces. Properly written programs should show it either way,
> unfortunately my target viewer is firefox (it's a web application and
> the user's default browser is firefox) and firefox isn't one of them.
> Without svg:'s it'll show the file as xml text instead of the image.
>
> HTH
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-24 Thread Peter S. Shenkin
Hi, Dimitri,

I have two questions about your code.

1. Why are you incrementing the atom index by 1? Are there functions in
RDKit, for example, that use atom indices using index-origin 1? Or is it
rather because chemists in your target audience will be thinking of the
first atom in, say, a structure from an sd file as atom #1?

2. Regarding the last line, most of the RDKit code I've seen in past
examples displays the molecule using code like the following. When is it
necessary/not necessary to remove the "svg" string from the results of
GetDrawingText()?

svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)

Thanks,
-P.

On Mon, Oct 24, 2016 at 2:31 PM, Dimitri Maziuk 
wrote:

> Since you already got your answer I'll just post this for posterity:
>
>
> import sys
> import rdkit
> import rdkit.Chem
> import rdkit.Chem.AllChem
> import rdkit.Chem.Draw
> import rdkit.Chem.Draw.rdMolDraw2D
>
> mol=rdkit.Chem.SupplierFromFilename(sys.argv[1],removeHs=False).next()
> dr=rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG(800,800)
> dr.SetFontSize(0.3)
> op = dr.drawOptions()
> for i in range(mol.GetNumAtoms()) :
> op.atomLabels[i]=mol.GetAtomWithIdx(i).GetSymbol() + str((i+1))
> rdkit.Chem.AllChem.Compute2DCoords(mol)
> dr.DrawMolecule(mol)
> dr.FinishDrawing()
> svg=dr.GetDrawingText()
>
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-23 Thread Peter S. Shenkin
Hi, Hongbin,

Thank you very much for your help. That worked! That blog entry is a great
tutorial, in general.

So it turned out that in the following test code, I had correctly figured
out how to set atomLabels, but instead of the last two lines in my .ipynb I
had just asserted "m". (How quickly we forget.. :-} )

Best,
-P.

rdDepictor.Compute2DCoords(m)
drawer = rdMolDraw2D.MolDraw2DSVG(400, 200)
drawer.drawOptions().atomLabels[0] = '0'
drawer.DrawMolecule(m)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)



On Sun, Oct 23, 2016 at 11:38 PM, 杨弘宾 <yanyangh...@163.com> wrote:

> Hi,Peter S. Shenkin,
> I think this blog may help you draw molecule with labels and it told
> more about drawing with rdMolDraw2D.
> http://rdkit.blogspot.com/2015/02/new-drawing-code.html
>
> --
> Hongbin Yang
>
> *From:* Peter S. Shenkin <shen...@gmail.com>
> *Date:* 2016-10-24 10:18
> *To:* Dimitri Maziuk <dmaz...@bmrb.wisc.edu>; RDKit Discuss
> <rdkit-discuss@lists.sourceforge.net>
> *Subject:* [Rdkit-discuss] 2D drawing with atoms labeled by index
> Hi,
>
> How do you get RDKit to label the atoms in a 2D drawing with their indices?
>
> There was some discussion of this that included Dimitri Maziuk in
> September, but it wasn't clear to me whether he actually had to modify the
> underlying drawing code to get this behavior.
>
> -P.
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-23 Thread Peter S. Shenkin
Hi,

How do you get RDKit to label the atoms in a 2D drawing with their indices?

There was some discussion of this that included Dimitri Maziuk in
September, but it wasn't clear to me whether he actually had to modify the
underlying drawing code to get this behavior.

-P.
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Solved! (was, Re: Fwd: Jupyter renders only from the outermost level?)

2016-10-14 Thread Peter S. Shenkin
In an earlier thread, I reported that I could not get Jupyter to render
except from the outermost level of the notebook. For instance, the
following code would not render Benzene:

--
from rdkit import Chem
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import SVG, display

m = Chem.MolFromSmiles("c1c1")

def render_mol(m):
rdDepictor.Compute2DCoords(m)
drawer = rdMolDraw2D.MolDraw2DSVG(400, 200)
drawer.DrawMolecule(m)
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)

render_mol(m)
--

Googling around for something else, I accidentally found out how to make
this work. Namely, replace the last line in render_mol() with:

display(SVG(svg))

In other words, use the IPython.display.display function. This function
will render to the screen from anywhere in your notebook code. You'll note
I have imported it in the above excerpt.

The suggestion at the time was to use MolsToGridImage(), which is a great
facility, but it might not always be what you want. So I was happy to find
this.

-P.
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Jupyter: "I forgot to remember to forget you"

2016-10-11 Thread Peter S. Shenkin
This is a Jupyter quirk that cost me some hours and caused me some grief.
So beware!

Please see the attached image and Jupyter notebook.

Note the following block of code, which has been succeeding for me for
days, even when "Run All Cells" is carried out:

my_mols = [mol.RDK_mol for mol in mols]
my_descriptions = [mol.description for mol in mols]
type(gridImage)
gridImage = MolsToGridImage(my_mols, molsPerRow=2, subImgSize=(350, 250),
legends=my_descriptions, useSVG=True, kekulize=False)
gridImage

The last line successfully displays the gird. But read on -- it won't
succeed for you.

When I made a copy of the notebook, it failed on the type() statement,
because gridImage had not yet been defined. When I removed the type()
statement, it failed on the last line, because gridImage must actually be
displayed via SVG(gridImage). But when I try the latter in the original,
SVG(gridImage) evokes an error message.

In the original, when "Cell>Run All Cells" is selected, type()
succeeds,because Jupyter still remembers the gridImage variable that had
been created in the previous invocation.

In early versions of the notebook, gridImage was a different structure that
did not require SVG() to display. Now the generated gridImage is supposed
to require SVG(), and yet it (1) fails with SVG() and (2) succeeds without
it as if SVG were being used. This strikes me as bizarre. I have a hard
time understanding how this can be.

However you see this, it's a cautionary tale.  Running a Jupyter notebook
is not the same as using a pretty interface to run a Python script.

The only thing I have found to avoid this is "Kernel > Restart and Run
All".

The web suggests "%reset" to clear user variables in iPython, but, if
executed at the top of the first In[] in Jupyter, it results in no output
whatsoever, so I don't know what it's doing. If someone knows of a
directive that can be used in-line, please speak up. I always want my
user-defined variables to be expunged if I execute "Cell > Run all"

-P.


Several-molecules.ipynb
Description: Binary data
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rendering of aromatics

2016-10-11 Thread Peter S. Shenkin
Thanks, Greg. Indeed, passing "kekulize=False" to MolsToGridImage works.

-P.

On Tue, Oct 11, 2016 at 1:56 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> HI Peter,
>
> On Tue, Oct 11, 2016 at 12:31 AM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>>
>> Please see the attached image for (1) and (2).
>>
>> 1. If I render a molecule via 'SVG(svg)', I get the dotted aromatic
>> representation.
>>
>
> A bit of clarification here, to help with the later answer.
> The key piece is where the SVG itself is generated: the call to
> drawer.DrawMolecule() in your render_mol() function.
>
>
>> 2. If I render it just by displaying 'MolFromSmiles(smi)', I get the
>> kekulized representation.
>>
>
> Here's the python code that does that generates that SVG (if you've
> configured SVG generation with IPythonConsole.ipython_useSVG):
> https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/
> Draw/IPythonConsole.py#L132
> or the PNG  (the default):
> https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/
> Draw/IPythonConsole.py#L103
> Notice the call to rdMolDraw2D.PrepareMolForDrawing() in each function.
> This does the kekulization.
>
>
> So I guessed that (for some reason) when SVG is used, RDKit automatically
>> uses the dotted representation.
>>
>> 3. However, if I display MolsToGridImage(mol, useSVG=True), I get the
>> kekulized form. (This method does not return an SVG, and therefore I cannot
>> display it using 'SVG(result)' ).
>>
>> So I have several questions:
>>
>> a. Is there a way to force MolsToGridImage to return the dotted aromatic
>> representation? Or to postprocess the result to achieve this?
>>
>>
> Try passing the "kekulize=False" argument to MolsToGridImage(). That
> should do it.
>
>
>> b. What is the underlying paradigm which dictates which representation
>> will be shown, and how can it be controlled?
>>
>
> By default the code kekulizes things before rendering them. You can change
> this default for normal molecule rendering in the notebook by setting
> IPythonConsole.kekulizeStructures to False. This doesn't currently impact
> what MolsToGridImage does, though it should.
>
>
>> c. What I would really like is the "circle" representation (i.e., benzene
>> as a hexagon with a circle inside). Can this be achieved, and if so, how?
>>
>
> Nope, not currently possible.
>
> -greg
>
>
>
>
>>
>> Thank you,
>> -P.
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Rendering of aromatics

2016-10-10 Thread Peter S. Shenkin
Hi,

Please see the attached image for (1) and (2).

1. If I render a molecule via 'SVG(svg)', I get the dotted aromatic
representation.

2. If I render it just by displaying 'MolFromSmiles(smi)', I get the
kekulized representation.

So I guessed that (for some reason) when SVG is used, RDKit automatically
uses the dotted representation.

3. However, if I display MolsToGridImage(mol, useSVG=True), I get the
kekulized form. (This method does not return an SVG, and therefore I cannot
display it using 'SVG(result)' ).

So I have several questions:

a. Is there a way to force MolsToGridImage to return the dotted aromatic
representation? Or to postprocess the result to achieve this?

b. What is the underlying paradigm which dictates which representation will
be shown, and how can it be controlled?

c. What I would really like is the "circle" representation (i.e., benzene
as a hexagon with a circle inside). Can this be achieved, and if so, how?

Thank you,
-P.
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: Jupyter renders only from the outermost level?

2016-10-10 Thread Peter S. Shenkin
Thank you, Brian. That worked. MolsToGridImage(mols, useSVG=True) is indeed
included in my recent conda install of RDKit. See attached image.

(I actually expected "SVG(gridImage)" to work on In[130], but it didn't.)

"Slowly gettin' the hang of it.",
-P.

On Mon, Oct 10, 2016 at 8:30 AM, Brian Kelley  wrote:

> Jupyter is quite clever about how it renders objects.  Essentially the
> last object in scope is examined for special properties, like __svg__ in
> this case.  In a loop, this doesn't happen the way you might expect.
>
> You might want to use MolsToGridImage in this case.  Greg also recently
> made an svg version of this recently, but I don't think it is officially
> released yet.
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] I'm having trouble with 2D structure depiction in a jupyter notebook

2016-10-09 Thread Peter S. Shenkin
Greg,

Thanks! That worked.

May I asked what search you did? After finding that it worked, I tried
seeing if I could find it on the web, without success.

-P.

On Mon, Oct 10, 2016 at 12:09 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Peter,
>
> that's odd behavior.
> The internet seems to think that this may be solvable by trusting the
> notebook.
> Can you please check under the notebook's File menu to see if it shows up
> as a "trusted notebook". If not, then trust it.
>
> -greg
>
>
> On Mon, Oct 10, 2016 at 4:46 AM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> I actually have never used jupyter before; I've always been a vi &
>> cmdline user. But I thought this would be a good time to start learning to
>> use some of the newer tools, and I really like the ability to render images
>> right there in the notebook.
>>
>> There's a RDKit jupyter notebook at nbviewer.jupyter.org that includes a
>> 2D rendering; however, the same notebook does not produce a rendering on my
>> local machine.
>>
>> The two screen shots are attached. Advice would be appreciated.
>>
>> I should say that I'm on Mac OS El Capitan, running from an Anaconda
>> environment with Python 2.7.
>>
>> By the way, when (in a separate notebook) I simply render benzene (code
>> as follows), this succeeds:
>>
>> from rdkit import Chem
>> from rdkit.Chem.Draw import IPythonConsole
>> Chem.MolFromSmiles("c1c1")
>> 
>>
>>
>>  -P.
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] NYC "RDKit Users and Learners" meetup Monday, Oct. 3

2016-09-29 Thread Peter S. Shenkin
Hi,

As a reminder to anyone in the NYC area who might be interested, I am
trying to get a few RDKit "users and learners" together via a meetup on
Monday evening, October 3. *More information here
*.
If it makes sense to do so, we can make this a regular thing in the future.

Best,
-P.
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-28 Thread Peter S. Shenkin
Hi,

Thanks... so it sounds like the main effort (aside from what you delicately
called "professional development" ;-) ) will be to introduce features that
improve robustness or performance when writing new code and possibly when
maintaining (fixing, extending) existing code.

-P.

On Thu, Sep 29, 2016 at 12:42 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Peter,
>
> On Sat, Sep 24, 2016 at 7:55 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Hi, I read your posting on Medium, and would be curious to hear which of
>> the many language features in c++11/14 you find most appealing. Is it that
>> you hope to rewrite things using these features, or, at the other extreme,
>> just want to make sure that the code remains compatible with new language
>> standards?
>>
> The standards committee has been very careful and the changes they made do
> not, to the best of my knowledge, break backwards compatibility (note: I'm
> just talking about being able to compile code and have it work, binary
> compatibility could be a different story, but that's less important).
>
> A big component of this is just being able to learn and use the new
> features in the language. It's a professional development thing for anyone
> working with the RDKit C++ code.
>
> Some of the changes (auto variables, range-based for loops, non-member
> begin() and end()) will help simplify the code, which is a big win.
> Others (unique pointers) will help with making things more explicit and, I
> hope, result in some speed improvements.
> And, the great unknown, move semantics could result in a nice performance
> boost. But that we'll have to see.
>
> -greg
>
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-09-26 Thread Peter S. Shenkin
2D drawing code is tough. The 90/10 rule applies: the last 10% of
correctness takes 90% of the effort.

I like Dmitri Agrafiotis's method, but IIRC it's patented; also, though
it's good for rough work, it doesn't produce "beautiful" structural
diagrams.

Some of the 2D drawing methods that do produce "pretty" pictures have a
large number of templates built in that match the most common (and even
somewhat uncommon) motifs, and they fall down when they hit something they
can't get a close enough match for. And then, the IUPAC has a whole list of
"desirable" features in 2D diagrams (as in, "Don't show it this way, but
rather show it that way."). So even if you produce what might appear to be
an acceptable drawing, it might not match the IUPAC list of desirables.

I think for the present purposes what we need is something correct, robust
and legible, and of course the example shown does not exhibit that. (But I
don't know what the starting SMILES is, so I don't know whether the
7-bonded C is due to a bad SMILES, in which case all bets are off.)

In addition, I think some discussion earlier indicated that the RDKit 2D
structures look much worse when H's are included.

I actually wrote a code one time (while at Schrödinger) to give a "badness"
score to 2D structures. When our 2D depiction development was in progress,
we created 2D SD files for many thousands of structures. I could put these
through the program and sort with the worst on top. That allowed the most
severe problems to be identified more quickly than, say, looking at
thousands of 2D diagrams. The program looked at three things: Number of
bonds that crossed, Number of atoms that were too close together, and Large
disparity of bond lengths within the same molecule. (The checking code
didn't deal with labels.)

Writing the checker was a fun project, but I'm glad I didn't have to write
the 2D depiction code. As Mark Twain said, "Improving oneself is good.
Improving others is better – and easier."

-P.

On Mon, Sep 26, 2016 at 5:54 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:

> On 09/26/2016 04:42 PM, Peter S. Shenkin wrote:
> > Also, the C attached to H44 has an extra H (its own or someone else's?)
> > superimposed upon it.
>
> I wonder if 2D drawing code should really work the same way as the 3D
> conformer generation: generate a bunch of candidate layouts and pick the
> one(s) with least clashes/overlaps.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-09-26 Thread Peter S. Shenkin
Also, the C attached to H44 has an extra H (its own or someone else's?)
superimposed upon it.

-P.

On Mon, Sep 26, 2016 at 5:38 PM, Dimitri Maziuk 
wrote:

>
> On the plus side, when drawing PubChem CID 5057 from a 3D SDF before and
> after our canonicalization, RDKit draws a mirror image, but otherwise
> the same 2D structure. OB's "after" version is attached: enjoy the
> 7-bond carbon in the ring.
>
> ;)
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-24 Thread Peter S. Shenkin
Hi, I read your posting on Medium, and would be curious to hear which of
the many language features in c++11/14 you find most appealing. Is it that
you hope to rewrite things using these features, or, at the other extreme,
just want to make sure that the code remains compatible with new language
standards?

-P.
Sent from a cell phone. Please forgive brvty and m1St@kes.

On Sep 24, 2016 2:26 AM, "Greg Landrum"  wrote:

> Dear all,
>
> I just did a blog post describing a proposal for some upcoming changes to
> the RDKit code base:
> https://medium.com/@greg.landrum_t5/the-rdkit-and-
> modern-c-48206b966218?source=linkShare-d698b3fa9f7-1474698147
>
> This is a big and important change and I'd love to hear whatever feedback
> members of the community may have. Please comment either on the blog post
> or here.
>
> Best Regards,
> -greg
>
>
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AddHs()

2016-09-09 Thread Peter S. Shenkin
How about "explicit",  rather than "physical", hydrogens?

-P.
Sent from a cell phone. Please forgive brvty and m1St@kes.

On Sep 9, 2016 1:57 AM, "Greg Landrum"  wrote:

>
>
> On Thu, Sep 8, 2016 at 10:35 PM, Dimitri Maziuk 
> wrote:
>
>> On 09/08/2016 02:26 PM, Brian Kelley wrote:
>> > Dimitri, Hs are removed.
>> >
>> > Their is a removeHs argument in MolFromMolBlock (python) that defaults
>> to
>> > true.
>> >
>> > There is a corollary in SDMolSupplier if you are using that.
>> >
>> > supplier = SDMolSupplier(filename, removeHs=false)
>> >
>> > if this helps.
>>
>> Thank you, it does:
>> rdkit.Chem.SupplierFromFilename(sys.argv[1], removeHs = False ).next()
>> returns a molecule with -- presumably "physical" -- hydrogens.
>>
>
> "Physical hydrogens" (I'm not sure that I like that term, but since I
> don't have a ready alternative other than "hydrogens that are in the graph"
> I will use it for now) are actually present in the graph, if you do
> mol.Debug(), Chem.MolToMolBlock() for Chem.MolToSmiles() you will see them.
>
>
>> The reason I ask is if they're removed and re-added, I'd worry about
>> their indexes matching what's in the source file. Which might matter in
>> the case of e.g. stereospecifically assigned methylene protons. (Or so
>> they tell me ;)
>
>
> This is absolutely correct: if you remove the Hs and then later re-add
> them it is extremely unlikely that you will end up with the same H indices
> before and after the change. It makes much more sense to just use
> removeHs=False
>
> -greg
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Novartis paper that used rdkit and included ca. 50 common reactions

2016-08-31 Thread Peter S. Shenkin
Thanks, Paulo and Greg.

Yup, that was the reference. Thanks!

Best,
-P.

On Wed, Aug 31, 2016 at 2:05 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

>
>
> On Wed, Aug 31, 2016 at 7:48 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>>
>> This is a bit off-topic, but I can't think of where else to ask. I
>> couldn't find it in a quick google search.
>>
>> Not that many years ago (4?) there was a paper from Novartis that used
>> rdkit to predict chem. rxn. products. The appendix listed ca. 50 reactions
>> in common use for drug discovery. It was in J. Med. Chem.
>>
>> Can someone supply the reference?
>>
>
> I think you mean this one: http://pubs.acs.org/doi/abs/10.1021/ci200379p
>
>
>> P.S. Is there a bibliography of published papers that used or referenced
>> RDKit?
>>
>
> Not that I'm aware of. One can always do this: https://scholar.google.
> ch/scholar?q=RDKit, but a curated (and annotated) version would be much
> more useful.
> It would be a good thing to have on an RDKit wiki, if there ever were to
> be such a thing again.[1] :-)
>
> -greg
> [1] this would just require me to set it up, but there hasn't been a big
> call from the community for such a thing
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Conda install of postgres cartrdige on Mac OS; was: Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-30 Thread Peter S. Shenkin
Hi,

Here's the resolution of the difficulties I was having installing
rdkit-postgresql95 on Mac OS X.

The problem turned out to be that the package originally posted used Py
3.5, and I'm still using 2.7. I may change to 3.5 at some point, but Greg
was kind enough to add a 2.7 version of the package.

So, the following invocations work to set up rdkit with the cartridge in a
new env on Mac OS X. I'm on El Capitan, by the way, and for clarity, I've
not tested the installation, but only checked that it completed
successfully.

conda create -n rdk1 -c rdkit rdkit
. activate rdk1
conda install -c greglandrum rdkit-postgresql95

(The last command also installs postgresql  9.5.4-0.)

Thanks, Greg, for your help on this.

-P.

On Tue, Aug 30, 2016 at 7:05 AM, Peter S. Shenkin <shen...@gmail.com> wrote:

> Hi, Greg,
>
> Thanks. That worked and gave me 3.4.
>
> But when I then tried to install rdkit-postgresq95, that failed;
>
> The command "conda install -c greglandrum rdkit-postgresql95" gave:
>
> The following specifications were found to be in conflict:
>   - rdkit-postgresql95
> Use "conda info " to see the dependencies for each package.
>
>
> Then the command "conda info rdkit-postgresql95" gave:
>
> Error: Package missing in current osx-64 channels:
>   - rdkit-postgresql95
>
>
> So it seems the problem is that there is no mac build of that package. I
> don't need the cartridge immediately, so there's no need to put this
> together right away for my sake. OTOH, if this does get together, I'm happy
> to try out the installation.
>
> Best,
> -P.
>
> On Tue, Aug 30, 2016 at 6:38 AM, Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> conda install -c rdkit rdkit
>>
>>
>> On Tue, Aug 30, 2016 at 12:27 PM, Peter S. Shenkin <shen...@gmail.com>
>> wrote:
>>
>>> Hi, Greg,
>>>
>>> What invocation should I use?
>>>
>>> "conda install -c greglandrum rdkit-postgresql95" still complains that I
>>> need rdkit 3.4.
>>>
>>> "conda update rdkit" fails to update rdkit from 3.1, whether or not I
>>> specify "-c greglandrum".
>>>
>>> Thanks,
>>> -P.
>>>
>>> On Tue, Aug 30, 2016 at 12:30 AM, Greg Landrum <greg.land...@gmail.com>
>>> wrote:
>>>
>>>> grrr, I uploaded the rdkit binaries to the wrong place.
>>>> That's taken care of now too.
>>>>
>>>> please try again.
>>>> -greg
>>>>
>>>>
>>>> On Tue, Aug 30, 2016 at 5:59 AM, Peter S. Shenkin <shen...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, Greg,
>>>>>
>>>>> That fails for me because it requires rdkit 3.4 at minimum.
>>>>>
>>>>> Earlier, I installed 3.1 using conda. There doesn't appear to be a way
>>>>> to install 3.4 using conda. (I tried the default anaconda and also "-c
>>>>> grelandrum".)
>>>>>
>>>>> -P.
>>>>>
>>>>> On Mon, Aug 29, 2016 at 10:55 PM, Greg Landrum <greg.land...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Sorry about that. The command I sent was wrong, and I had made a
>>>>>> mistake when I uploaded the file.
>>>>>>
>>>>>> Please try this one:
>>>>>> conda install -c greglandrum rdkit-postgresql95
>>>>>>
>>>>>> -greg
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 10:47 PM, Peter S. Shenkin <shen...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi, Greg,
>>>>>>>
>>>>>>> Thanks. I tried your invocation, but conda cannot find the package:
>>>>>>>
>>>>>>> (rdk0) > conda install -c https://conda.anaconda.org/greglandrum
>>>>>>> rdkit-postgresql95 2>&1 | tee install_rdkit-postresql_fr_gre
>>>>>>> glandrum.log
>>>>>>>
>>>>>>> Fetching package metadata .
>>>>>>> Solving package specifications: .
>>>>>>> Error: Package missing in current osx-64 channels:
>>>>>>>   - rdkit-postgresql95
>>>>>>>
>>>>>>> Close matches found; did you mean one of these?
>>>>>>> rdkit-postgresql95: postgresql
>>>>>>>
>>>>>>> You can sea

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-30 Thread Peter S. Shenkin
Hi, Greg,

Thanks. That worked and gave me 3.4.

But when I then tried to install rdkit-postgresq95, that failed;

The command "conda install -c greglandrum rdkit-postgresql95" gave:

The following specifications were found to be in conflict:
  - rdkit-postgresql95
Use "conda info " to see the dependencies for each package.


Then the command "conda info rdkit-postgresql95" gave:

Error: Package missing in current osx-64 channels:
  - rdkit-postgresql95


So it seems the problem is that there is no mac build of that package. I
don't need the cartridge immediately, so there's no need to put this
together right away for my sake. OTOH, if this does get together, I'm happy
to try out the installation.

Best,
-P.

On Tue, Aug 30, 2016 at 6:38 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> conda install -c rdkit rdkit
>
>
> On Tue, Aug 30, 2016 at 12:27 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Hi, Greg,
>>
>> What invocation should I use?
>>
>> "conda install -c greglandrum rdkit-postgresql95" still complains that I
>> need rdkit 3.4.
>>
>> "conda update rdkit" fails to update rdkit from 3.1, whether or not I
>> specify "-c greglandrum".
>>
>> Thanks,
>> -P.
>>
>> On Tue, Aug 30, 2016 at 12:30 AM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> grrr, I uploaded the rdkit binaries to the wrong place.
>>> That's taken care of now too.
>>>
>>> please try again.
>>> -greg
>>>
>>>
>>> On Tue, Aug 30, 2016 at 5:59 AM, Peter S. Shenkin <shen...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Greg,
>>>>
>>>> That fails for me because it requires rdkit 3.4 at minimum.
>>>>
>>>> Earlier, I installed 3.1 using conda. There doesn't appear to be a way
>>>> to install 3.4 using conda. (I tried the default anaconda and also "-c
>>>> grelandrum".)
>>>>
>>>> -P.
>>>>
>>>> On Mon, Aug 29, 2016 at 10:55 PM, Greg Landrum <greg.land...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry about that. The command I sent was wrong, and I had made a
>>>>> mistake when I uploaded the file.
>>>>>
>>>>> Please try this one:
>>>>> conda install -c greglandrum rdkit-postgresql95
>>>>>
>>>>> -greg
>>>>>
>>>>>
>>>>> On Mon, Aug 29, 2016 at 10:47 PM, Peter S. Shenkin <shen...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, Greg,
>>>>>>
>>>>>> Thanks. I tried your invocation, but conda cannot find the package:
>>>>>>
>>>>>> (rdk0) > conda install -c https://conda.anaconda.org/greglandrum
>>>>>> rdkit-postgresql95 2>&1 | tee install_rdkit-postresql_fr_gre
>>>>>> glandrum.log
>>>>>>
>>>>>> Fetching package metadata .
>>>>>> Solving package specifications: .
>>>>>> Error: Package missing in current osx-64 channels:
>>>>>>   - rdkit-postgresql95
>>>>>>
>>>>>> Close matches found; did you mean one of these?
>>>>>> rdkit-postgresql95: postgresql
>>>>>>
>>>>>> You can search for packages on anaconda.org with
>>>>>> anaconda search -t conda rdkit-postgresql95
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 2:52 AM, Greg Landrum <greg.land...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> Thanks for pointing out the conda postgresql build; it's great that
>>>>>>> they are providing this now (though it's a bit irritating that it 
>>>>>>> doesn't
>>>>>>> have readline support). I was able to do a Mac build of the RDKit 
>>>>>>> cartridge
>>>>>>> against that postgresql install this morning.
>>>>>>>
>>>>>>> Since this uses a different set of dependencies, I'm not comfortable
>>>>>>> pushing this to the main rdkit channel until after Riccardo and I have 
>>>>>>> had
>>>>>>> a chance to talk about how we want to handle PostgreSQL going forward. 
>>>>>>> In
>>>>>>> the meantime ther

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-30 Thread Peter S. Shenkin
Hi, Greg,

What invocation should I use?

"conda install -c greglandrum rdkit-postgresql95" still complains that I
need rdkit 3.4.

"conda update rdkit" fails to update rdkit from 3.1, whether or not I
specify "-c greglandrum".

Thanks,
-P.

On Tue, Aug 30, 2016 at 12:30 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> grrr, I uploaded the rdkit binaries to the wrong place.
> That's taken care of now too.
>
> please try again.
> -greg
>
>
> On Tue, Aug 30, 2016 at 5:59 AM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Hi, Greg,
>>
>> That fails for me because it requires rdkit 3.4 at minimum.
>>
>> Earlier, I installed 3.1 using conda. There doesn't appear to be a way to
>> install 3.4 using conda. (I tried the default anaconda and also "-c
>> grelandrum".)
>>
>> -P.
>>
>> On Mon, Aug 29, 2016 at 10:55 PM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> Sorry about that. The command I sent was wrong, and I had made a mistake
>>> when I uploaded the file.
>>>
>>> Please try this one:
>>> conda install -c greglandrum rdkit-postgresql95
>>>
>>> -greg
>>>
>>>
>>> On Mon, Aug 29, 2016 at 10:47 PM, Peter S. Shenkin <shen...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Greg,
>>>>
>>>> Thanks. I tried your invocation, but conda cannot find the package:
>>>>
>>>> (rdk0) > conda install -c https://conda.anaconda.org/greglandrum
>>>> rdkit-postgresql95 2>&1 | tee install_rdkit-postresql_fr_gre
>>>> glandrum.log
>>>>
>>>> Fetching package metadata .
>>>> Solving package specifications: .
>>>> Error: Package missing in current osx-64 channels:
>>>>   - rdkit-postgresql95
>>>>
>>>> Close matches found; did you mean one of these?
>>>> rdkit-postgresql95: postgresql
>>>>
>>>> You can search for packages on anaconda.org with
>>>> anaconda search -t conda rdkit-postgresql95
>>>>
>>>> On Mon, Aug 29, 2016 at 2:52 AM, Greg Landrum <greg.land...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Peter,
>>>>>
>>>>> Thanks for pointing out the conda postgresql build; it's great that
>>>>> they are providing this now (though it's a bit irritating that it doesn't
>>>>> have readline support). I was able to do a Mac build of the RDKit 
>>>>> cartridge
>>>>> against that postgresql install this morning.
>>>>>
>>>>> Since this uses a different set of dependencies, I'm not comfortable
>>>>> pushing this to the main rdkit channel until after Riccardo and I have had
>>>>> a chance to talk about how we want to handle PostgreSQL going forward. In
>>>>> the meantime there is a Mac build in my channel at anaconda.org. It
>>>>> would be great if you could try this out and let me know if it works for
>>>>> you:
>>>>> conda install -c https://conda.anaconda.org/greglandrum
>>>>> rdkit-postgresql95
>>>>>
>>>>> Best,
>>>>> -greg
>>>>>
>>>>>
>>>>> On Sat, Aug 27, 2016 at 7:49 AM, Peter S. Shenkin <shen...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks, Greg. I assumed that because the installation manager gave a
>>>>>> conda invocation for rdkit-postgresql, it must exist. Perhaps a bad
>>>>>> assumption, but that's why I guessed that it's just not been ported to 
>>>>>> OS X.
>>>>>>
>>>>>> FWIW, there is a conda build of PostgreSQL, called "postgresql".
>>>>>>
>>>>>> Thanks again,
>>>>>> -P.
>>>>>>
>>>>>> -P.
>>>>>>
>>>>>> On Sat, Aug 27, 2016 at 12:56 AM, Greg Landrum <
>>>>>> greg.land...@gmail.com> wrote:
>>>>>>
>>>>>>> There just isn't a conda build (yet) for the cartridge (or, I
>>>>>>> believe, for PostgreSQL itself).
>>>>>>>
>>>>>>> I did a bit of looking this morning and it is going to require some
>>>>>>> work to get things working.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 27, 2016 at 6:25 AM +0200, "Peter S. Shenkin" <
>>>>>>> shen...@gmail.com> wrote:
>>>>>>>
>>>>>>> Well, actually, thought I'd try the conda install of
>>>>>>>> rdkit-postgresql, using:
>>>>>>>>
>>>>>>>> conda install -c https://conda.binstar.org/rdkit rdkit-postgresql
>>>>>>>>
>>>>>>>> I get the message:
>>>>>>>>
>>>>>>>> Error: Package missing in current osx-64 channels:
>>>>>>>>   - rdkit-postgresql
>>>>>>>> Close matches found; did you mean one of these?
>>>>>>>> rdkit-postgresql: postgresql
>>>>>>>>
>>>>>>>>
>>>>>>>> Is the cartridge not supported on OS X?
>>>>>>>>
>>>>>>>> -P.
>>>>>>>>
>>>>>>>> On Sat, Aug 27, 2016 at 12:10 AM, Greg Landrum <
>>>>>>>> greg.land...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Aug 27, 2016 at 5:23 AM, Peter S. Shenkin <
>>>>>>>>> shen...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lazy execution rules.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> :-)
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-29 Thread Peter S. Shenkin
Hi, Greg,

That fails for me because it requires rdkit 3.4 at minimum.

Earlier, I installed 3.1 using conda. There doesn't appear to be a way to
install 3.4 using conda. (I tried the default anaconda and also "-c
grelandrum".)

-P.

On Mon, Aug 29, 2016 at 10:55 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Sorry about that. The command I sent was wrong, and I had made a mistake
> when I uploaded the file.
>
> Please try this one:
> conda install -c greglandrum rdkit-postgresql95
>
> -greg
>
>
> On Mon, Aug 29, 2016 at 10:47 PM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Hi, Greg,
>>
>> Thanks. I tried your invocation, but conda cannot find the package:
>>
>> (rdk0) > conda install -c https://conda.anaconda.org/greglandrum
>> rdkit-postgresql95 2>&1 | tee install_rdkit-postresql_fr_greglandrum.log
>>
>> Fetching package metadata .
>> Solving package specifications: .
>> Error: Package missing in current osx-64 channels:
>>   - rdkit-postgresql95
>>
>> Close matches found; did you mean one of these?
>> rdkit-postgresql95: postgresql
>>
>> You can search for packages on anaconda.org with
>> anaconda search -t conda rdkit-postgresql95
>>
>> On Mon, Aug 29, 2016 at 2:52 AM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> Hi Peter,
>>>
>>> Thanks for pointing out the conda postgresql build; it's great that they
>>> are providing this now (though it's a bit irritating that it doesn't have
>>> readline support). I was able to do a Mac build of the RDKit cartridge
>>> against that postgresql install this morning.
>>>
>>> Since this uses a different set of dependencies, I'm not comfortable
>>> pushing this to the main rdkit channel until after Riccardo and I have had
>>> a chance to talk about how we want to handle PostgreSQL going forward. In
>>> the meantime there is a Mac build in my channel at anaconda.org. It
>>> would be great if you could try this out and let me know if it works for
>>> you:
>>> conda install -c https://conda.anaconda.org/greglandrum
>>> rdkit-postgresql95
>>>
>>> Best,
>>> -greg
>>>
>>>
>>> On Sat, Aug 27, 2016 at 7:49 AM, Peter S. Shenkin <shen...@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Greg. I assumed that because the installation manager gave a
>>>> conda invocation for rdkit-postgresql, it must exist. Perhaps a bad
>>>> assumption, but that's why I guessed that it's just not been ported to OS 
>>>> X.
>>>>
>>>> FWIW, there is a conda build of PostgreSQL, called "postgresql".
>>>>
>>>> Thanks again,
>>>> -P.
>>>>
>>>> -P.
>>>>
>>>> On Sat, Aug 27, 2016 at 12:56 AM, Greg Landrum <greg.land...@gmail.com>
>>>> wrote:
>>>>
>>>>> There just isn't a conda build (yet) for the cartridge (or, I believe,
>>>>> for PostgreSQL itself).
>>>>>
>>>>> I did a bit of looking this morning and it is going to require some
>>>>> work to get things working.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Aug 27, 2016 at 6:25 AM +0200, "Peter S. Shenkin" <
>>>>> shen...@gmail.com> wrote:
>>>>>
>>>>> Well, actually, thought I'd try the conda install of rdkit-postgresql,
>>>>>> using:
>>>>>>
>>>>>> conda install -c https://conda.binstar.org/rdkit rdkit-postgresql
>>>>>>
>>>>>> I get the message:
>>>>>>
>>>>>> Error: Package missing in current osx-64 channels:
>>>>>>   - rdkit-postgresql
>>>>>> Close matches found; did you mean one of these?
>>>>>> rdkit-postgresql: postgresql
>>>>>>
>>>>>>
>>>>>> Is the cartridge not supported on OS X?
>>>>>>
>>>>>> -P.
>>>>>>
>>>>>> On Sat, Aug 27, 2016 at 12:10 AM, Greg Landrum <
>>>>>> greg.land...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 27, 2016 at 5:23 AM, Peter S. Shenkin <shen...@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Lazy execution rules.
>>>>>>>>
>>>>>>>
>>>>>>> :-)
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-29 Thread Peter S. Shenkin
Hi, Greg,

Thanks. I tried your invocation, but conda cannot find the package:

(rdk0) > conda install -c https://conda.anaconda.org/greglandrum
rdkit-postgresql95 2>&1 | tee install_rdkit-postresql_fr_greglandrum.log

Fetching package metadata .
Solving package specifications: .
Error: Package missing in current osx-64 channels:
  - rdkit-postgresql95

Close matches found; did you mean one of these?
rdkit-postgresql95: postgresql

You can search for packages on anaconda.org with
anaconda search -t conda rdkit-postgresql95

On Mon, Aug 29, 2016 at 2:52 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Peter,
>
> Thanks for pointing out the conda postgresql build; it's great that they
> are providing this now (though it's a bit irritating that it doesn't have
> readline support). I was able to do a Mac build of the RDKit cartridge
> against that postgresql install this morning.
>
> Since this uses a different set of dependencies, I'm not comfortable
> pushing this to the main rdkit channel until after Riccardo and I have had
> a chance to talk about how we want to handle PostgreSQL going forward. In
> the meantime there is a Mac build in my channel at anaconda.org. It would
> be great if you could try this out and let me know if it works for you:
> conda install -c https://conda.anaconda.org/greglandrum
> rdkit-postgresql95
>
> Best,
> -greg
>
>
> On Sat, Aug 27, 2016 at 7:49 AM, Peter S. Shenkin <shen...@gmail.com>
> wrote:
>
>> Thanks, Greg. I assumed that because the installation manager gave a
>> conda invocation for rdkit-postgresql, it must exist. Perhaps a bad
>> assumption, but that's why I guessed that it's just not been ported to OS X.
>>
>> FWIW, there is a conda build of PostgreSQL, called "postgresql".
>>
>> Thanks again,
>> -P.
>>
>> -P.
>>
>> On Sat, Aug 27, 2016 at 12:56 AM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> There just isn't a conda build (yet) for the cartridge (or, I believe,
>>> for PostgreSQL itself).
>>>
>>> I did a bit of looking this morning and it is going to require some work
>>> to get things working.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Aug 27, 2016 at 6:25 AM +0200, "Peter S. Shenkin" <
>>> shen...@gmail.com> wrote:
>>>
>>> Well, actually, thought I'd try the conda install of rdkit-postgresql,
>>>> using:
>>>>
>>>> conda install -c https://conda.binstar.org/rdkit rdkit-postgresql
>>>>
>>>> I get the message:
>>>>
>>>> Error: Package missing in current osx-64 channels:
>>>>   - rdkit-postgresql
>>>> Close matches found; did you mean one of these?
>>>> rdkit-postgresql: postgresql
>>>>
>>>>
>>>> Is the cartridge not supported on OS X?
>>>>
>>>> -P.
>>>>
>>>> On Sat, Aug 27, 2016 at 12:10 AM, Greg Landrum <greg.land...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sat, Aug 27, 2016 at 5:23 AM, Peter S. Shenkin <shen...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Lazy execution rules.
>>>>>>
>>>>>
>>>>> :-)
>>>>>
>>>>
>>>>
>>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-26 Thread Peter S. Shenkin
Thanks, Greg. I assumed that because the installation manager gave a conda
invocation for rdkit-postgresql, it must exist. Perhaps a bad assumption,
but that's why I guessed that it's just not been ported to OS X.

FWIW, there is a conda build of PostgreSQL, called "postgresql".

Thanks again,
-P.

-P.

On Sat, Aug 27, 2016 at 12:56 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> There just isn't a conda build (yet) for the cartridge (or, I believe, for
> PostgreSQL itself).
>
> I did a bit of looking this morning and it is going to require some work
> to get things working.
>
>
>
>
>
> On Sat, Aug 27, 2016 at 6:25 AM +0200, "Peter S. Shenkin" <
> shen...@gmail.com> wrote:
>
> Well, actually, thought I'd try the conda install of rdkit-postgresql,
>> using:
>>
>> conda install -c https://conda.binstar.org/rdkit rdkit-postgresql
>>
>> I get the message:
>>
>> Error: Package missing in current osx-64 channels:
>>   - rdkit-postgresql
>> Close matches found; did you mean one of these?
>> rdkit-postgresql: postgresql
>>
>>
>> Is the cartridge not supported on OS X?
>>
>> -P.
>>
>> On Sat, Aug 27, 2016 at 12:10 AM, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Aug 27, 2016 at 5:23 AM, Peter S. Shenkin <shen...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Lazy execution rules.
>>>>
>>>
>>> :-)
>>>
>>
>>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


  1   2   >