Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Peter S. Shenkin
Well, if I'm recalling correctly, a highly symmetric structure like buckminsterfullerene takes a long time to canonicalize. I don't know what the formal definition of a planar graph is, but I would guess it's not what chemists mean when they say a molecule is planar. -P. On Thu, Jun 15, 2023 at

Re: [Rdkit-discuss] hybridization of nitrogen in beta-lactam

2021-02-13 Thread Peter S. Shenkin
Amide Ns are usually viewed as sp2 because of the resonance RC(=O)-NR2 <-> RC([O-])=[N+]R2, where R can be H. Unlike sp3 Ns (amines), amides are not strong H-bond acceptors, though both amides and amines are strong donors. This observation is consistent with sp2 character. -P. On Sat, Feb 13,

Re: [Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-22 Thread Peter S. Shenkin
> Carboxylates are different in that the popular representation (C(=O)[O-]) > doesn't break the octet rule. But another interesting case is nitro groups: > > In [11]: mol = Chem.MolFromSmiles('CN(=O)=O') > > > In [12]: Chem.MolToSmiles(mol) > > Out[12]: 'C[N+](=O)[O-]' > > &g

Re: [Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-21 Thread Peter S. Shenkin
It seems to me offhand RDKit's choice is analogous the way carboxylates are generally notated: R-C(=O)O- rather than R-C+(O-)O- . Both are legitimate and in fact equivalent upon application of chemical knowledge, but do you prefer the second representation for carboxylates? -P. On Thu, Jan

Re: [Rdkit-discuss] From MW to structure

2021-01-07 Thread Peter S. Shenkin
Are you starting with an integral molecular weight or an experimentally determined value, perhaps even a set of values from mass spec? If it's an integral value then, if you are willing to settle for known compounds, it might not be too hard. You could derive a bunch of empirical formulas

Re: [Rdkit-discuss] canonicalization of two aromatic molecules returning two different forms (kekule and aromatic)

2020-11-27 Thread Peter S. Shenkin
Yes, I've seen the same phenomenon in multiple SMILES generators. Even Daylight's (when they had it up on a public web site). >From a chemical perspective, it isn't sensible that the pyridone-like ring in molecule 1 should not be seen as aromatic in the canonical SMILES, especially since the

Re: [Rdkit-discuss] [EXTERNAL] Re: Morgan FP atom numbering

2020-10-28 Thread Peter S. Shenkin
I found that on the NY Public Library web site, the book is available, chapter by chapter, as a digital download, if you have a library card. The host site is at John’s-Hopkins, so check your local library system, which might also supply access. -P. On Wed, Oct 28, 2020 at 12:08 PM Cyrus Maher

Re: [Rdkit-discuss] Nitrogen sp2 isomers get the same InChI Key

2020-10-25 Thread Peter S. Shenkin
Canonical SMILES is probably the way to go, but you might also be able to use the InchiKey and the Inchi auxiliary information together as a compound hash key. -P. On Sun, Oct 25, 2020 at 10:53 AM Adelene LAI wrote: > Hi Gustavo, > > > (Sorry, forgot to reply all before...) > > > Your

Re: [Rdkit-discuss] A question of molecule structure

2020-09-20 Thread Peter S. Shenkin
It could involve either a tautomeric solution or a zwitterionic solution. But it is not clear to me why the current structure needs to be altered. After all, pyridones are most commonly written as shown. -P. On Sun, Sep 20, 2020 at 12:19 AM Markus Metz wrote: > Dear Gao: > Your question is a

Re: [Rdkit-discuss] Molecular weight function

2020-04-08 Thread Peter S. Shenkin
It is probably best to say that it is the sum of atomic weights for the atoms in the molecule, where each element gets an atomic weight computed by summing the products of its isotope atomic weights with the natural fractional abundance of the isotope. For some elements, this is not terribly well

Re: [Rdkit-discuss] The RDKit and GSoC 2020

2020-03-06 Thread Peter S. Shenkin
"Cells in columns named SMILES, or have SMILES as a substring in the header, will be depicted in 2D using RDKit" Sounds like a great project, but I think the above can be improved upon as a specification. In many or even most situations, users will want to be able to view the SMILES as a string

Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-22 Thread Peter S. Shenkin
Hi, I still believe that Acepentalene should not be recognized by RDKit as aromatic, because there is no ring that contains 4n+2 electrons. The fact that counting bonds not in the outer ring gives 10 electrons should not make the outer ring aromatic. Moreover, RDKit seems to perceive aromaticity

Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-22 Thread Peter S. Shenkin
Hi, For aromaticity, I believe a ring has to have 4n+2 electrons along its periphery. I would be curious to know what other SMILES generators make of this system. -P. On Wed, Jan 22, 2020 at 8:14 AM Greg Landrum wrote: > Hi Andrew, > > There's a bug here. > > Here's what I believe is

Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-09 Thread Peter S. Shenkin
Since the entire system is antiaromatic, why are any carbons at all shown as aromatic in the SMILES? -P. On Thu, Jan 9, 2020 at 3:56 PM Andrew Dalke wrote: > Hi all, > > Could someone explain the following, which uses the SMILES from > https://en.wikipedia.org/wiki/Acepentalene : > > >>> from

Re: [Rdkit-discuss] AlignMol and GetBestRMS

2019-10-17 Thread Peter S. Shenkin
(I meant an RMSD of about 1 Angstrom. ) On Thu, Oct 17, 2019 at 5:00 PM Peter S. Shenkin wrote: > A large RMSD could come about from a large number of small interatomic > deviations or a small number of large ones. In the latter case, the > difference in conformation could

Re: [Rdkit-discuss] AlignMol and GetBestRMS

2019-10-17 Thread Peter S. Shenkin
A large RMSD could come about from a large number of small interatomic deviations or a small number of large ones. In the latter case, the difference in conformation could be large. It is useful to also obtain the largest interatomic deviation following superimposition in order to determine which

Re: [Rdkit-discuss] Problems with SMILES using MolFromSmiles

2019-09-24 Thread Peter S. Shenkin
A carboxylate has to be represented as C(=O)[O-]. Use ...[OH] for an uncharged carboxyl. Similarly, a tetravalent aliphatic N has to be given a + charge. -P. On Tue, Sep 24, 2019 at 9:15 PM Scalfani, Vincent wrote: > Dear Navid, > > > > RDKit rejects tetravalent Nitrogen by default. This

Re: [Rdkit-discuss] Which method to prefer for computing 2D coordinates

2019-04-09 Thread Peter S. Shenkin
When I was at Schrödinger, I wrote a simple program to find bad 2D structures. I no longer have access to the code, but I computed two things: 1. Number of bond lengths deviating from the median bond length (MBL) by 50% or more (i.e., <0.5*MBL or >2*MBL) 2. Number of bond crossings The overall

Re: [Rdkit-discuss] Bug with Calculation of aromatic rings?

2019-03-06 Thread Peter S. Shenkin
Atom 20 appears to be an NH. Shouldn’t it be a pyridine N? On Wed, Mar 6, 2019 at 5:04 AM Colin Bournez wrote: > Hi Greg, > > Indeed it seems one bond is not tagged as aromatic. > > Here are the aromatics bond (begin atom, end atom) : > > 0 1 > 1 19 > 19 16 > 11 14 > 14 12 > 12 7 > 7 20 > 11 0

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
This is a cute example. The left ring is one in which every atom and every bond is aromatic, and yet the ring is not aromatic. Unlike azulene, in which neither ring, alone, is aromatic On Tue, Oct 23, 2018 at 12:36 PM Greg Landrum wrote: > > I'll try later (likely tomorrow) to explain what I

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
On Tue, Oct 23, 2018 at 1:08 PM Chris Earnshaw wrote: > Interesting - I do hope your idea works out! > > This prompted me to see what happens with azulene, which is another case > where the envelope is aromatic but neither of the individual rings are > based on a simple neutral representation.

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
ributes two electrons to whatever ring system it's > in. > > That certainly handles the things we've discussed so far, as well as easy > cases like pyridine and quinone. Now I need to try and find some stuff that > breaks it. > > -greg > > > On Tue, Oct 23, 2018 at

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
This is just to note that pyridones are considered aromatic by all SMILES kits I've seen (thought I've certainly not seen them all!), and pyridone itself is cited in the Daylight Theory Manual as an example of an exocyclic double bond which does not break aromaticity. -P. On Tue, Oct 23, 2018 at

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
: > > > On Tue, Oct 23, 2018 at 3:00 PM Peter S. Shenkin > wrote: > >> >> It's difficult to fault RDKit for making the same mistake that everybody >> else blithely accepts; but it would be great, IMO, if it could do better >> than everyone else in this regard. >

Re: [Rdkit-discuss] Aromaticity question

2018-10-23 Thread Peter S. Shenkin
Hi, I raised the same issue that Francis raised on the RDKit Slack channel on Jan 14, 1917, with a different example (c1c[nH]c2nccc-2c1). With the same response. Of course, breaking the non-aromatic ring causes the remaining aromatic ring to be perceived as aromatic, as Greg's response would

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Peter S. Shenkin
It is very far from a solved problem, since it depends strongly on the interactions within the crystal. And it’s not terribly uncommon for a drug-like compound to exhibit different crystal forms, each with its own melting point and solubility. This has been an issue for drug formulation, where you

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Peter S. Shenkin
> quite easily. At the very least, it’s worth keeping track of the initial > number of neighbours within the cluster cutoff that each fingerprint had so > as to distinguish real singletons from these artefactual ones. > Dave > > > On Tue, 25 Sep 2018 at 19:56, Peter S. S

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Peter S. Shenkin
e clusters really are at least roughly representative, by comparing them with viewable random subsets of structures from the clusters. -P. On Tue, Sep 25, 2018 at 2:36 PM, Andrew Dalke wrote: > On Sep 25, 2018, at 17:13, Peter S. Shenkin wrote: > > FWIW, in work on conformational cluste

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Peter S. Shenkin
(I see that I accidentally responded to Andrew, only, earlier; I'm copying to the group this time.) FWIW, in work on conformational clustering, I used the “most representative” molecule; that is, the real molecule closest to the mathematical centroid. This would probably be the best way of

Re: [Rdkit-discuss] enumeration of smiles question

2018-08-06 Thread Peter S. Shenkin
Just curious, Guillaume, why do you want to do this? On Mon, Aug 6, 2018 at 5:58 AM Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Dear Greg, > > > > Fantastic, thank you to give both explanation and solution to this “simple > question”, I know this is not so simple & it’s fundamental

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread Peter S. Shenkin
n.wikipedia.org/wiki/Hash_table#Collision_resolution ). All > this by way of saying that to go from fingerprint to the molecular > structure which produced it is traditionally impossible unless the > fingerprint no longer amounts to a hash(ing) function. > -- > j > > > On Fri, Apr 2

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread Peter S. Shenkin
Isn't it the case that more than one molecule can share an identical fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl, extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could keep going and come up with multiple matches, plus multiple near-misses. -P. On

[Rdkit-discuss] OFFLINE... RDKit and Mathematica

2018-01-12 Thread Peter S. Shenkin
So, do you work with Bob Nachbar? If so, please tell him I said hello. -P. (ex-Schrödinger) On Fri, Jan 12, 2018 at 10:06 PM, Jason Biggs wrote: > To the developers of RDKit - this is a great package you've made and the > level of support and responsiveness to bugs is

Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread Peter S. Shenkin
I think you probably used a slightly different SMILES than the one you showed. The one you showed should have given ((0,1,3,4),(2,1,3,4)). The proper merge rule would then be to consider all matches equivalent if the 2nd and 3rd atom in the match agree, in any order; i.e, the two carbons, indices

Re: [Rdkit-discuss] Fwd: Re: HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
gt; > > -P. > > Sent from a cell phone. Please forgive brvty and m1St@kes. > > -- Forwarded message -- > > From: "Peter S. Shenkin" <shen...@gmail.com> > > Date: Sep 13, 2017 3:15 PM > > Subject: Re: [Rdkit-discuss] HasSubstructMatch doesn't work

[Rdkit-discuss] Fwd: Re: HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
I neglected to cc Rdkit on this earlier. If he can get the matching atom list from their other program, he won't have to mess w. SMARTS matching in Rdkit. -P. Sent from a cell phone. Please forgive brvty and m1St@kes. -- Forwarded message -- From: "Peter S. Shenkin&q

Re: [Rdkit-discuss] HasSubstructMatch doesn't work as expected

2017-09-13 Thread Peter S. Shenkin
Your course of action depends upon just what you are really trying to do. If it's only aspirin, then why wouldn't you just do it manually? If it goes beyond aspirin, you have to start by defining in general terms exactly what you want to match to what. For example, given a query molecule (aspirin

Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Peter S. Shenkin
Hi, In SMARTS, 'a' matches an aromatic atom. So you would match your molecule with the pattern 'aaa', or if you wanted to restrict yourself to carbons, 'ccc'. This would match whether you created the molecule from a Kekulized or an aromatic SMILES. Remember that it's the molecular recognition

Re: [Rdkit-discuss] ETKDG conformation generation algorithm and fullerene-like structures.

2017-09-07 Thread Peter S. Shenkin
Too much symmetry for conformational comparison? Many or most conformation generators will test new conformations for a match with previously generated conformations, and will bail out if they can't exhaust all possibilities. (I don't know if this is the case RDKit facilities.) -P. On Thu, Sep

Re: [Rdkit-discuss] list of failed chembl ids

2017-08-08 Thread Peter S. Shenkin
I looked up a bunch of these. The ones I saw are ChEMBL activity records, not molecule records, so they do not contain structural data. But I would be curious to see the 51 CHEMBL SMILES that RDKit could not parse. -P. -P. On Tue, Aug 8, 2017 at 3:00 PM, Bennion, Brian

Re: [Rdkit-discuss] . Re: using rdkit to read in chembl23 1.7 million compounds

2017-08-07 Thread Peter S. Shenkin
That molecule's SMILES is correctly rendered by RDKit, or at least by the version of RDKit behind Slack: [image: Inline image 1] -P. On Mon, Aug 7, 2017 at 3:54 PM, Bennion, Brian wrote: > The carbocations are in small heterocyclic molecules. see CHEMBL3815233 > > Brian >

[Rdkit-discuss] Peter S. Shenkin and Leila Tai Shenkin have moved!

2017-07-10 Thread Peter S . Shenkin
528 5352 Leila: 646 331 2210 Email: Peter: shen...@gmail.com Leila: leila_shen...@mindspring.com Leila (work): le...@leilataidesign.com Peter's Stories by Peter S. Shenkin http://tinyletter.com/shenkin 325 W. 52nd St New York, NY 10019 USA Sent to rdkit-dis

[Rdkit-discuss] Back When Gas Was 30¢ A Gallon

2017-07-09 Thread Peter S . Shenkin
You may have received this story previously. If so, please excuse the duplication. -P Back When Gas Was 30¢ A Gallon Peter S. Shenkin Back when gas was 30¢ a gallon, And love was only 60¢ away Thus sang Tom T. Hall. I can’t say this story is exactly about that, but it took place exactly

Re: [Rdkit-discuss] Clustering

2017-06-12 Thread Peter S. Shenkin
" A clustering algorithm, that does not require specifying the number of classes upfront (so not K-means)." A general approach to O(N) hierarchical clustering is: 1. Pick a random sqrt(N) structures. 2. Do full hierarchical O(N^2) clustering on these. 3. Select your favored clustering level to

Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Peter S. Shenkin
not for N=N. But I may just > have a very limited knowledge of RDkit. > > This is how it looks like in ChemDraw: > [image: Inline image 1] > > > Thanks, > Yuran > > On Thu, May 11, 2017 at 1:33 PM, Peter S. Shenkin <shen...@gmail.com> > wrote: > >>

Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Peter S. Shenkin
The problematic part is just the beginning of your would-be SMILES: N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps you mean one of the substructures illustrated in the attached (which at least satisfy normal valence rules). If not, perhaps you could attach a structural

Re: [Rdkit-discuss] RDKit-Py3DMol integration

2017-05-09 Thread Peter S. Shenkin
and future help as well. > > Notes to David Koes: > Dear sir, > I will come up with specific questions very soon (within one or two > weeks). I wish you don't mind getting my touch. Thanks much in advance for > your kind help. > > Sincerely, > -Malitha > > > > O

Re: [Rdkit-discuss] Another Can't kekulize mol observation

2017-04-27 Thread Peter S. Shenkin
I would just replace 'n' with '[nH]' in your existing SMILES, for the N you want the H on. -P. On Thu, Apr 27, 2017 at 12:32 AM, Hongbin Yang wrote: > Hi Markus, > “c1ccc(cc1)-c1nnc(n1)-c1c1” is different from > "c1ccc(cc1)-c1nncn1-c1c1", > so you cannot remove

Re: [Rdkit-discuss] Information contained in SMARTS and SMILES

2017-04-19 Thread Peter S. Shenkin
On Wed, Apr 19, 2017 at 7:25 PM, Andrew Dalke <da...@dalkescientific.com> wrote: > On Apr 19, 2017, at 23:59, Peter S. Shenkin <shen...@gmail.com> wrote: > > One more thing. The term "Mol" in RDKit and some other tookits does not > really mean "molecule"

Re: [Rdkit-discuss] Information contained in SMARTS and SMILES

2017-04-19 Thread Peter S. Shenkin
One more thing. The term "Mol" in RDKit and some other tookits does not really mean "molecule" in the sense that chemists use it. It is used to connote a data structure that can store a SMARTS or a SMILES. Only when a SMILES is used does it really correspond to a chemical "molecule", except, in

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread Peter S. Shenkin
s/listinfo/rdkit-discuss >>> or, via email, send a message with subject or body 'help' to >>> rdkit-discuss-requ...@lists.sourceforge.net >>> >>> You can reach the person managing the list at >>> rdkit-discuss-ow...@lists.sourceforge.net >>

Re: [Rdkit-discuss] Check If Atom Is in Two Small Rings

2017-04-11 Thread Peter S. Shenkin
But Brian's solution won't help Jonathan find atoms that are in two three-membered or two four-membered rings, which I thought Jonathan also wanted, based on the wording of the original query. -P. On Tue, Apr 11, 2017 at 4:12 PM, Curt Fischer wrote: > Brian's solution

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-11 Thread Peter S. Shenkin
Just from the slides, it's not clear that Roger had a solution; the slides seem to just suggest an approach. Am I missing something here? That is, he defined the invariants that all tautomers of a compound have to share and expressed it as a SMARTS + constraints; but I didn't see that he provided

[Rdkit-discuss] NYC "RDKit Users and Learners" Meetup Monday, April 3, 7 PM at Hack Manhattan

2017-03-31 Thread Peter S. Shenkin
For more information, see: https://www.meetup.com/RDKit-Users-and-Learners/events/237963674/?rv=ce2&_af=event&_af_eid=237963674=on If you have RDKit-related work that you'd like to talk about or ask about, please let me know. -P.

Re: [Rdkit-discuss] looking for feedback on new python API documentation format

2017-03-28 Thread Peter S. Shenkin
Hi, Greg, Here are my comments. - Formatting - pdoc at a glance is certainly more handsome than epydoc - To my eye, there is a huge amount of wasted space in the pdoc documentation. - The line spacing is hugely disproportional to the font size - Maybe this

Re: [Rdkit-discuss] delete a substructure

2017-03-10 Thread Peter S. Shenkin
Sounds like Daylight's "depictmatch", unfortunately no longer available on line -P. On Fri, Mar 10, 2017 at 1:28 PM, David Cosgrove wrote: > Hi, > In the RDKit source, under the 2d drawing code in the c++ part there's the > full source code for a QT program that

Re: [Rdkit-discuss] Question about WedgeMolBonds

2017-02-26 Thread Peter S. Shenkin
(or by means of an optional argument :-) ) -P. Sent from a cell phone. Please forgive brvty and m1St@kes. On Feb 26, 2017 00:36, "Greg Landrum" wrote: > > > On Sat, Feb 25, 2017 at 7:23 PM, John Mayfield < > john.wilkinson...@gmail.com> wrote: > >> Is there

Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Peter S. Shenkin
With Glide, IIRC, this facility is designed for the use case where the coordinates of a docked ligand are known (typically from an X-ray structure) and the docked ligand shares a SMARTS with the ligands in an input file. The SMARTS-matching atoms of each incoming ligand are superposed upon the

Re: [Rdkit-discuss] UFF and MMFF conformers energy

2017-02-09 Thread Peter S. Shenkin
Small atomic displacements can cause large forcefield energy differences. Computing molecular-mechanics energies from exactly the same coordinates using two different force-fields is probably not a reasonable procedure. It would be better to do an energy minimization with the two force fields

Re: [Rdkit-discuss] RDKit "cannot create mol from SMILE" error

2017-01-18 Thread Peter S. Shenkin
In addition to Brian's observation, there is also a "C1" early in the SMILES, but no corresponding X1 to make a ring bond before or after it. It appears that you might be reading the second half of a SMILES for some reason. My guess is that the (C=C1) is associated with a preceding atom that was

Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Peter S. Shenkin
You say "most stable", but I think you mean "most common." 2H is as stable as 1H, but less common. -P. On Wed, Jan 18, 2017 at 5:01 PM, Milinda Samaraweera < milindaatw...@gmail.com> wrote: > Hi Bob, > > I am trying to filter out any compound that does not have the most stable > isotopic form;

Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Peter S. Shenkin
How about a regex filter on the all-atom SMILES? -P. On Wed, Jan 18, 2017 at 9:56 AM, Milinda Samaraweera < milindaatw...@gmail.com> wrote: > Dear Experts, > > I am trying to figure out a way to exclude entries which contain heavy > atoms (13C, 2H, 3H, etc), from a SD file (which has close to

Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
i Maziuk" <dmaz...@bmrb.wisc.edu> wrote: > On 12/29/2016 02:35 PM, Peter S. Shenkin wrote: > > Dimitri, > > > > You were the one who suggested that all the structural depictions be > > generated. > > > > I, in contrast, suggested that only the ones u

Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
at 2:49 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > On 12/29/2016 12:43 PM, Peter S. Shenkin wrote: > > > Of the > > billion structures, only a fraction will ever be visualized, so a > > memoization strategy sounds reasonable, which in turn implies that you

Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
Look, it all boils down to (CPU) time, and time is money. Generating a billion depictions on the cloud will cost you the use of the machines. Increasing the depiction speed by a factor of 10 decreases the cost by a factor of 10, to a pretty good approximation. Storage is also money, so it doesn't

Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Peter S. Shenkin
ructures (current size PubChem Compound): >> >> 1s per structure = 1074 days (~3 years) >> 100 ms per structure = 107 days >> 1ms per structure = 25 hours >> >> John >> >> On 15 December 2016 at 23:12, Peter S. Shenkin <shen...@gmail.com> wrote

Re: [Rdkit-discuss] Bug in AllChem.EmbedMultipleConfs pruning?

2016-12-22 Thread Peter S. Shenkin
Tri-anything groups can be considered one by one after the remaining heavy atoms have been aligned. This turns a combinatorial explosion into a linear algorithm for these groups. (Well, it would be linear in number of tri-anything groups, but it gets more complicated if the anythings are more than

Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Peter S. Shenkin
Yes, of course, storing the images is an alternative. -P. On Thu, Dec 15, 2016 at 5:46 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > On 12/15/2016 04:23 PM, Peter S. Shenkin wrote: > > > Obviously, it doesn't matter if you're rendering just few structures, but > >

Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Peter S. Shenkin
sc.edu> wrote: > On 12/15/2016 02:53 PM, Peter S. Shenkin wrote: > > Looks good, but maybe too slow for production use... (?) > > I wonder what kind of production use would require sub-second wall clock > time for this. > > -- > Dimitri Maziuk > Programmer/sysadmin > BioM

Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Peter S. Shenkin
Looks good, but maybe too slow for production use... (?) -P. On Thu, Dec 15, 2016 at 3:38 PM, Chris Swain wrote: > At first glance this looks an interesting approach. > > Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram > Generation of Complex Molecules

Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)

2016-11-18 Thread Peter S. Shenkin
en quickly, but there is some low-hanging fruit (like cutting > crossed bonds) that I ought to be able to do something about.[1] > > -greg > [1] the trick is to avoid, as much as possible, creating drawings that > look like Möbius strips. > _____ > From:

Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)

2016-11-17 Thread Peter S. Shenkin
> On 17 Nov 2016, at 4:12 PM, Dimitri Maziuk wrote: > > Philosophically speaking, there must exist molecules for which a legible > 2D projection is simply not possible. Hi, I don't think that 2D projection of a 3D structure is an appropriate paradigm for 2D depiction,

Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-27 Thread Peter S. Shenkin
It would seem that a major issue with RDKit's multiconformer file is the inability to associate structure-level and atom-level properties with conformations. t's not quite orthogonal to the question of how to read, say, a multiconformer SD file into RDKit's multiconformer format, because the

Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)

2016-10-26 Thread Peter S. Shenkin
Hey, by the way, my agenda is trying to understand all this. I'm ignorant about the general area and have learned something. But don't worry -- not enough to be dangerous. :-) If something comes out of the discussion that's generally useful, great! By the way, when you post your UGM Jupyter

Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)

2016-10-25 Thread Peter S. Shenkin
se in a browser. I'm still not sure why GetDrawingText() doesn't return a properly formatted svg string. Is there some use its output can be put to without these .replacements? -P On Tue, Oct 25, 2016 at 1:35 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > On 10/25/2016 11:21 AM, Peter

Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index

2016-10-25 Thread Peter S. Shenkin
pace of svg were removed). > > -- > Hongbin Yang > > > *From:* Peter S. Shenkin <shen...@gmail.com> > *Date:* 2016-10-25 13:27 > *To:* Dmitri Maziuk <dmitri.maz...@gmail.com> > *CC:* RDKit Discuss <rdkit-discuss@lists.sourceforge.net> > *Subject:* Re: [R

Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index

2016-10-24 Thread Peter S. Shenkin
Hi, Dima wrote: > > Try saving the text (svg/svg2) to a file and opening it in chrome (if you > can actually open a file in chrome) or some other application. I actually did that, and in a second email I reported: > >- Chrome thinks svg.svg is empty > > >- When I load svg2.svg, Chrome

Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-24 Thread Peter S. Shenkin
, in a Jupyter notebook in Chrome. On Mon, Oct 24, 2016 at 6:44 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > On 10/24/2016 04:39 PM, Peter S. Shenkin wrote: > > > Or is it > > rather because chemists in your target audience will be thinking of the > > first atom in, sa

Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-24 Thread Peter S. Shenkin
Hi, Dimitri, I have two questions about your code. 1. Why are you incrementing the atom index by 1? Are there functions in RDKit, for example, that use atom indices using index-origin 1? Or is it rather because chemists in your target audience will be thinking of the first atom in, say, a

Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-23 Thread Peter S. Shenkin
<yanyangh...@163.com> wrote: > Hi,Peter S. Shenkin, > I think this blog may help you draw molecule with labels and it told > more about drawing with rdMolDraw2D. > http://rdkit.blogspot.com/2015/02/new-drawing-code.html > > -- > Hong

[Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-23 Thread Peter S. Shenkin
Hi, How do you get RDKit to label the atoms in a 2D drawing with their indices? There was some discussion of this that included Dimitri Maziuk in September, but it wasn't clear to me whether he actually had to modify the underlying drawing code to get this behavior. -P.

[Rdkit-discuss] Solved! (was, Re: Fwd: Jupyter renders only from the outermost level?)

2016-10-14 Thread Peter S. Shenkin
In an earlier thread, I reported that I could not get Jupyter to render except from the outermost level of the notebook. For instance, the following code would not render Benzene: -- from rdkit import Chem from rdkit.Chem import rdDepictor from rdkit.Chem.Draw import rdMolDraw2D from

[Rdkit-discuss] Jupyter: "I forgot to remember to forget you"

2016-10-11 Thread Peter S. Shenkin
This is a Jupyter quirk that cost me some hours and caused me some grief. So beware! Please see the attached image and Jupyter notebook. Note the following block of code, which has been succeeding for me for days, even when "Run All Cells" is carried out: my_mols = [mol.RDK_mol for mol in mols]

Re: [Rdkit-discuss] Rendering of aromatics

2016-10-11 Thread Peter S. Shenkin
Thanks, Greg. Indeed, passing "kekulize=False" to MolsToGridImage works. -P. On Tue, Oct 11, 2016 at 1:56 AM, Greg Landrum <greg.land...@gmail.com> wrote: > HI Peter, > > On Tue, Oct 11, 2016 at 12:31 AM, Peter S. Shenkin <shen...@gmail.com> > wrote: > >&

[Rdkit-discuss] Rendering of aromatics

2016-10-10 Thread Peter S. Shenkin
Hi, Please see the attached image for (1) and (2). 1. If I render a molecule via 'SVG(svg)', I get the dotted aromatic representation. 2. If I render it just by displaying 'MolFromSmiles(smi)', I get the kekulized representation. So I guessed that (for some reason) when SVG is used, RDKit

Re: [Rdkit-discuss] Fwd: Jupyter renders only from the outermost level?

2016-10-10 Thread Peter S. Shenkin
Thank you, Brian. That worked. MolsToGridImage(mols, useSVG=True) is indeed included in my recent conda install of RDKit. See attached image. (I actually expected "SVG(gridImage)" to work on In[130], but it didn't.) "Slowly gettin' the hang of it.", -P. On Mon, Oct 10, 2016 at 8:30 AM, Brian

Re: [Rdkit-discuss] I'm having trouble with 2D structure depiction in a jupyter notebook

2016-10-09 Thread Peter S. Shenkin
The internet seems to think that this may be solvable by trusting the > notebook. > Can you please check under the notebook's File menu to see if it shows up > as a "trusted notebook". If not, then trust it. > > -greg > > > On Mon, Oct 10, 2016 at 4:46 AM, Peter S.

[Rdkit-discuss] NYC "RDKit Users and Learners" meetup Monday, Oct. 3

2016-09-29 Thread Peter S. Shenkin
Hi, As a reminder to anyone in the NYC area who might be interested, I am trying to get a few RDKit "users and learners" together via a meetup on Monday evening, October 3. *More information here

Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-28 Thread Peter S. Shenkin
hu, Sep 29, 2016 at 12:42 AM, Greg Landrum <greg.land...@gmail.com> wrote: > Hi Peter, > > On Sat, Sep 24, 2016 at 7:55 PM, Peter S. Shenkin <shen...@gmail.com> > wrote: > >> Hi, I read your posting on Medium, and would be curious to hear which of >> the

Re: [Rdkit-discuss] drawing code take 3

2016-09-26 Thread Peter S. Shenkin
iuk <dmaz...@bmrb.wisc.edu> wrote: > On 09/26/2016 04:42 PM, Peter S. Shenkin wrote: > > Also, the C attached to H44 has an extra H (its own or someone else's?) > > superimposed upon it. > > I wonder if 2D drawing code should really work the same way as the 3D > conformer

Re: [Rdkit-discuss] drawing code take 3

2016-09-26 Thread Peter S. Shenkin
Also, the C attached to H44 has an extra H (its own or someone else's?) superimposed upon it. -P. On Mon, Sep 26, 2016 at 5:38 PM, Dimitri Maziuk wrote: > > On the plus side, when drawing PubChem CID 5057 from a 3D SDF before and > after our canonicalization, RDKit draws

Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-24 Thread Peter S. Shenkin
Hi, I read your posting on Medium, and would be curious to hear which of the many language features in c++11/14 you find most appealing. Is it that you hope to rewrite things using these features, or, at the other extreme, just want to make sure that the code remains compatible with new language

Re: [Rdkit-discuss] AddHs()

2016-09-09 Thread Peter S. Shenkin
How about "explicit", rather than "physical", hydrogens? -P. Sent from a cell phone. Please forgive brvty and m1St@kes. On Sep 9, 2016 1:57 AM, "Greg Landrum" wrote: > > > On Thu, Sep 8, 2016 at 10:35 PM, Dimitri Maziuk > wrote: > >> On

Re: [Rdkit-discuss] Novartis paper that used rdkit and included ca. 50 common reactions

2016-08-31 Thread Peter S. Shenkin
Thanks, Paulo and Greg. Yup, that was the reference. Thanks! Best, -P. On Wed, Aug 31, 2016 at 2:05 PM, Greg Landrum <greg.land...@gmail.com> wrote: > > > On Wed, Aug 31, 2016 at 7:48 PM, Peter S. Shenkin <shen...@gmail.com> > wrote: > >> >> This is a bi

[Rdkit-discuss] Conda install of postgres cartrdige on Mac OS; was: Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-30 Thread Peter S. Shenkin
-c greglandrum rdkit-postgresql95 (The last command also installs postgresql 9.5.4-0.) Thanks, Greg, for your help on this. -P. On Tue, Aug 30, 2016 at 7:05 AM, Peter S. Shenkin <shen...@gmail.com> wrote: > Hi, Greg, > > Thanks. That worked and gave me 3.4. > > But when I

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-30 Thread Peter S. Shenkin
ely, so there's no need to put this together right away for my sake. OTOH, if this does get together, I'm happy to try out the installation. Best, -P. On Tue, Aug 30, 2016 at 6:38 AM, Greg Landrum <greg.land...@gmail.com> wrote: > conda install -c rdkit rdkit > > > On Tue,

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-30 Thread Peter S. Shenkin
2016 at 12:30 AM, Greg Landrum <greg.land...@gmail.com> wrote: > grrr, I uploaded the rdkit binaries to the wrong place. > That's taken care of now too. > > please try again. > -greg > > > On Tue, Aug 30, 2016 at 5:59 AM, Peter S. Shenkin <shen...@gmail.com> > wrote:

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-29 Thread Peter S. Shenkin
<greg.land...@gmail.com> wrote: > Sorry about that. The command I sent was wrong, and I had made a mistake > when I uploaded the file. > > Please try this one: > conda install -c greglandrum rdkit-postgresql95 > > -greg > > > On Mon, Aug 29, 2016 at 10:47 PM, Peter S

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-29 Thread Peter S. Shenkin
; the meantime there is a Mac build in my channel at anaconda.org. It would > be great if you could try this out and let me know if it works for you: > conda install -c https://conda.anaconda.org/greglandrum > rdkit-postgresql95 > > Best, > -greg > > > On Sat, Aug 27, 2016 at 7:49 AM

Re: [Rdkit-discuss] Latest Mac Conda installation advice; was: Boost 1.61 and the RDKit work together.

2016-08-26 Thread Peter S. Shenkin
work > to get things working. > > > > > > On Sat, Aug 27, 2016 at 6:25 AM +0200, "Peter S. Shenkin" < > shen...@gmail.com> wrote: > > Well, actually, thought I'd try the conda install of rdkit-postgresql, >> using: >> >> conda install -c https://co

  1   2   >