A quick comment on the cosine metric. Unlike Tanimoto it obeys the triangle
inequality, so in cases where it's used essentially as a distance metric
(e.g. some clustering applications) the results are probably more
mathematically correct. I used it a lot in that context. Whether it makes
any real
Sorry - tried to type this too early in the morning and introduced some
errors transcribing the SMARTS pattern!
It should have been "[CH](=O)O[$([CH3]),$([CH2]C)]") as in
pat1 = Chem.MolFromSmarts("[CH](=O)O[$([CH3]),$([CH2]C)]")
Best regards,
Chris
On Sun, 9 Feb 2020 at 0
Hi
I've always regarded it as dangerous to rely on the use of explicit
hydrogens in search queries and pattern matches. I think it's generally
safer to use H-count properties in your SMARTS. In your example case this
will require the use of recursive SMARTS to allow matching of the CH3 and
CH2Cn
Hi
Dot-disconnected fragments are not going to work for this, as you describe.
You need to use recursive SMARTS (see
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html section
4.4). Something like:
Clc[$(cBr);$(ccBr);$(cccBr)]
should (I hope!) be a reasonable starting point.
Chris
irly easy).
>
> On Tue, Oct 23, 2018 at 6:13 PM Chris Earnshaw
> wrote:
>
>>
>> Following this analysis means you don't need to consider the resonance
>> form:
>> A carbonyl or imine (open chain or in a partially saturated ring) group's
>> carbon atom
Hi
I think my approach to this is - Is there a resonance form in which the
ring in question in unequivocally aromatic and the separated charge ends up
somewhere sensible? The 'electron stealing' concept is a sort of handy
shortcut for this.
For Greg's examples, I'd say:
[image: image.png]
I'm
Mea culpa - I hit Reply rather than Reply All and so only sent this to
Greg...
On Tue, 23 Oct 2018 at 13:53, Chris Earnshaw wrote:
> Hi Greg
>
> Apologies again, I'm not trying to stir things up here. As we can see from
> some of the the other discussion there's no clear
optimistic...
Regards,
Chris Earnshaw
On Wed, 10 Oct 2018 at 13:16, Michal Krompiec
wrote:
> Hi Thomas,
> Radius 2, 2048 bits, 5200 data points.
>
> On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis
> wrote:
>
>> What's your bitvector length and radius? How many trainin
fast enough to be useful.
Regards,
Chris Earnshaw
On Thu, 27 Sep 2018 at 02:36, Francois Berenger wrote:
> On 21/09/2018 16:53, Chris Earnshaw wrote:
> > Hi
> >
> > I'm afraid I can't help with an RDkit solution to your question, but
> > there are a couple of issues w
of analysis. It's better to use an
alternative which does obey the triangle inequality - e.g. the Cosine
metric.
Regards,
Chris Earnshaw
On Thu, 20 Sep 2018 at 21:55, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:
> RDkit Discussion Group,
>
> I
Hi
The question 'what do you mean by ALL?' springs to mind. None of the
discussion includes dot-disconnected SMILES, which are also perfectly valid
representations. For example C(C1C2)C.C12 is yet another SMILES (of many
possible) for the example structure.
I've no idea whether this is of any
Hi
It looks to me like N5 [nH:5] also has a problem. This has 3 connections to
heavy atoms, is specified to have a hydrogen attached, but has no charge.
This may not have triggered an error but it looks wrong, especially in this
structure. Surely this atom should just be [n:5] ?
Best regards,
Hi
I'm no Python expert, but I think the problem is that Python doesn't (by
default) do filename globbing. As a result it doesn't understand the
significance of the ~ character in your directory path and tries to
interpret it literally. The simple solution is to just give a path that can
be
reated by this message by any
> person unless specifically indicated by agreement in writing other than
> email.
> Monitoring: MedChemica Limited retains and monitors all email traffic data
> and content for the purposes of the prevention and detection of
> crime, ensuring the sec
I'd say that using RDkit to calculate the numbers of heavy atoms is
significantly more robust than a purely lexical approach - and it's easy to
implement.
It's also dangerous to just discard the smallest fragment. Years ago I
worked on a project where the active molecule had only 11 heavy atoms
Hi Maria
I would say that the behaviour of RDKit with your MOL2 file is right.
SMILES notation doesn't have a way to represent the delocalised form of a
carboxylate anion, so the O=C[O-] form is the correct SMILES for this
structure. RDKit does a good job in recognising that it's the anion based
Hi Felipe
You're doing something similar to the problem Paolo addressed.
ConstrainedEmbed (see
http://www.rdkit.org/Python_Docs/rdkit.Chem.AllChem-module.html#ConstrainedEmbed)
requires a mol object as the first parameter, but you are passing it an
integer cid value, not a molecule. Your code
The minuses are right. These are the single bonds between the individual
aromatic rings and this representation is strictly correct. The OpenBabel
representation doesn't mark these bonds as explicitly single and, as
they're between two aromatic atoms, the bond type could be inferred to be
Hi Jan
Your code doesn't change the charges because the reaction SMARTS doesn't
tell it to. If you say -
rxn_smarts = ['[N+:1]=[*:2]-[O-:3]>>[N+0:1]-[*:2]=[O+0:3]']
- the charges in the product are explicitly defined and you should get the
result you expect.
Best regards,
Chris
On 25 January
I don't think there's a way to do this using RDKit itself, but it appears
to be straightforward using Python with numpy and networkx, e.g.
import numpy as np
import networkx as nx
a = np.matrix([[0, 1, 0, 0, 0],[1, 0, 1, 1, 0],[0, 1, 0, 0, 0],[0, 1, 0, 0,
1],[0, 0, 0, 1, 0]])
b =
e Kim
>
> On Tue, Nov 21, 2017 at 8:22 AM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>
>> Hi
>>
>> The entries for P and As in RDKit's atomic_data.cpp are -
>> 15 P 0.750.892.0830.974 5 31
>> 30.97376163 3
checks on dative bond forms which
presumably now get converted)
- graphmolMolOpsTest (builds perchlorates etc. and expects the result to be
in dative bond form)
- pythonTestDirChem (not sure what's wrong with this one - I can't find
what it does!)
Apologies for the length of all this...
Chris
Note that this way of doing things disables all "unreasonable" valence
> checking.
>
> -greg
>
>
> On Tue, Nov 21, 2017 at 10:12 AM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>>
>> Hi
>>
>> Sometime between 2014 and now there appears to
one you report
>
> chlorate [O-][Cl2+]([O+])[O-]
> perchlorate [O-][Cl3+]([O-])([O+])[O-]
>
> it looks wrong to me as there is an overall formal charge of +1. All O's
> should bear a -1 charge.
>
> Cheers,
> p.
>
>
>
> On 11/21/17 09:12, Chris Earnsh
this to happen!
Does anyone know a way to restore the old behaviour for chlorites,
bromates, periodates etc.?
Best regards,
Chris Earnshaw
--
Check out the vibrant tech community on one of the world's most
engaging tech sites
values > 1, so by default it's not possible to construct e.g.
> chlorates, or bromates, and no perhalates are allowed.
>
> Regards,
> Chris Earnshaw
>
> On 20 November 2017 at 23:03, Yoolhee Kim <yoolh...@andrew.cmu.edu> wrote:
>> Hello,
>>
>> I'm trying to get
Trouble is, you're mixing chemical operations and lexical ones. It
might be handy if this 'just worked' but in practice it's not going to
produce valid SMILES without more work.
I've written code in the past to do this kind of thing for virtual
library building, using dummy atoms to mark link
-
[#8-:2]-[#7+:1]=[O:3]>>[O+0:2]=[N+0:1]=[O:3]
Chris Earnshaw
On 9 October 2017 at 15:57, Chris Murphy <chris.mur...@schrodinger.com> wrote:
> Hi,
>
> I am using rdChemReactions to perform substructure transformations as
> defined by configurable reaction smarts. When I cre
are only a dozen problem cases out of 1.5 million compounds, I
> just removed them from my main file and downloaded the mol files from
> chembl and double check the structures.
>
> Bran
>
> -Original Message-
> From: Chris Earnshaw [mailto:c
____
> From: Chris Earnshaw <cgearns...@gmail.com>
> Sent: Thursday, October 5, 2017 12:04:02 AM
> To: Bennion, Brian; RDKit Discuss (rdkit-discuss@lists.sourceforge.net)
> Subject: Re: [Rdkit-discuss] nitrogen valence issues
>
> Hi
>
> Be aware t
Hi
Be aware that there is a problem with one of the azide groups in
CHEMBL592333 - in SMILES it's '-N=[NH+]-[NH-]' rather than '-N=[N+]=[N-].
This doesn't render the structure chemically invalid but it's probably
wrong.
What's the provenance of your SD file? It isn't the same as as a fresh
Hi
It amounts to the same thing - either do all tests on one atom, or one test
on all atoms.
The syntax is shorter for the latter if you can use the vector bindings but
may not be otherwise, especially if multiple exclusions are needed.
Regards,
Chris Earnshaw
On 24 Sep 2017 16:54, "
atoms have been matched - for example, do you want to match quinoline
because it contains a benzene ring, or exclude it because it contains
a pyridine? If the former you'll have to check that the atoms matched
by your two patterns are different.
Hope this helps!
Chris Earnshaw
On 24 September 2017
-ring aromatic pattern a:1:a:a:a:a:a:1,
with recursive SMARTS applied to the first atom to ensure that this
can't match any of the 6 ring atoms in your undesired system.
Regards,
Chris Earnshaw
On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
<rdkit-discuss@lists.sourceforge.net>
) as required, but if you have a specific need
for the 'single SMARTS' approach that's not much use. Sorry not to be more
helpful...
Chris Earnshaw
On 19 September 2017 at 16:50, James T. Metz <jamestm...@aol.com> wrote:
> Chris,
>
> Thank you for your interesting
Hi
Open Babel will convert a wide range of structure formats and can produce
at least a couple of different flavours of Z-matrix, including MOPAC and
Gaussian. I'm not aware of any way to get a Z-matrix directly from RDKit
(but would be happy to find out I'm wrong).
Regards,
Chris Earnshaw
Hi
Will the recursive SMARTS [$(C-C),$(N-N)] not do the job?
I'd parse this in English as 'an atom which is EITHER an aliphatic carbon
singly bonded to an aliphatic carbon OR an aliphatic nitrogen singly bonded
to an aliphatic nitrogen'.
Regards,
Chris Earnshaw
On 19 September 2017 at 15:01
Hi
The problem is due to RDkit perceiving the embedded pyranone in
CHEMBL1999443 as an aromatic system, which is probably correct. However, in
the structure of aspirin the carboxyl carbon and singly bonded oxygen are
non-aromatic, so if you just use the SMILES of aspirin as a query it won't
match
Hi Akos
Very strange behaviour. I don't see anything wrong with your SQL syntax.
I've tried equivalent searches in my 2.6M compound database and they give
the expected results. I used iodine rather than gold, for which there are
19504 structures. Adding the qualifying SQL clauses singly and in
Hi Brian
I'm by no means an expert in RDKit with Python, but until someone else
comes along, here are a few thoughts.
Your reaction SMARTS specifically defines aromatic carbons joined by single
bonds which won't match an incoming benzene ring, and it's a bit redundant
to specify that aromatic
y but it does appear
> to match the moe PMI's.
>
>
>
> On Tue, Jan 17, 2017 at 4:55 AM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>
>> The new version looks good to me as far as I can test it. PMI and NPR are
>> still fine, the radius of gyration is right (fo
confusion.
Chris
On 16 January 2017 at 09:30, Greg Landrum <greg.land...@gmail.com> wrote:
>
>
> On Mon, Jan 16, 2017 at 10:22 AM, Chris Earnshaw <ch...@cge-compchem.co.uk
> > wrote:
>
>>
>> Either way, it makes it rather hard to trust their derivations ge
is impossible to
say.
Either way, it makes it rather hard to trust their derivations generally -
especially as there appear to be other errors (e.g. the denominator in eq.
16 should be the square root of the given sum of squares, according to
their reference).
Best regards,
Chris
Dr Chris Earnshaw
f a planar molecule like benzene
>>> should be zero. The eigenvalues of the inertia matrix for benzene, however,
>>> are definitely not zero (and not close enough that it's likely to be
>>> round-off error).
>>> It would be very nice if you could run the three files I
e to look into this this weekend and I've found
> a bug and something I don't understand. Hopefully the community can help
> out here.
>
> On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>
>> 4) The big one! The returned results look very odd. They app
any more information from me.
Chris Earnshaw
On 8 Jan 2017 18:17, "Brian Kelley" <fustiga...@gmail.com> wrote:
I think the relevant issue is that if you are using an existing build, we
don't yet have the capability for you to know what was built and what was
not. I.e. You need
tidied it up (having just looked at it to get the
> link above, I see there's a typo on the first sentence, for example!) and
> sent in an interim Pull Request as for people starting out it might already
> be of value.
>
> Cheers,
> Dave
>
> On Sun, 8 Jan 2017 at 10:19, Chris E
Hi
A while ago I had a project which needed PMI descriptors (specifically NPR1
and NPR2) which were not available in the main branch of RDKit at the time.
At the time I used the fork by 'hahnda6' which provided the
calcPMIDescriptors() function, and this worked well. Now that PMI
descriptors are
48 matches
Mail list logo