from:"David Cosgrove"

Re: [Rdkit-discuss] Redirect error messages to a file in Python?

2024-02-04 Thread David Cosgrove

Thanks Joel,
That's really helpful.  Armed with that information, I was able to dig
further into the source code and find some unit tests that do very similar
to what you have set up, but within a context manager.  Hopefully the mail
client won't mangle the formatting.

Cheers,
Dave


#!/usr/bin/env python

import logging

from contextlib import contextmanager
from io import StringIO

from rdkit import rdBase, Chem

@contextmanager
def log_to_python(level=None):
  """
Temporarily redirect logging to Python streams, optionally
setting a specific log level.
  """
  rdBase.LogToPythonLogger()
  pylog = logging.getLogger("rdkit")
  if level is not None:
original_level = pylog.level
pylog.setLevel(level)

  yield pylog

  if level is not None:
pylog.setLevel(original_level)
  rdBase.LogToCppStreams()


@contextmanager
def capture_logging(level=None):
  """
Temporarily redirect logging to a Python StringIO, optionally
setting a specific log level.
  """
  log_stream = StringIO()
  stream_handler = logging.StreamHandler(stream=log_stream)

  with log_to_python(level) as pylog:
pylog.addHandler(stream_handler)

yield log_stream

pylog.removeHandler(stream_handler)


with capture_logging(logging.WARNING) as log_stream:

mol = Chem.MolFromSmiles('c1c1(C)(C)')
print(f'and the message is {log_stream.getvalue()}')
# this clears the log message.  If you want to keep all the messages
# as one long string, don't do this.
log_stream.truncate(0)

mol = Chem.MolFromSmiles('C(C)(C)(C)(C)C')
print(f'and now the message is {log_stream.getvalue()}')




On Thu, Feb 1, 2024 at 9:20 PM Joel Duerksen  wrote:

> I've changed my strategy a few times. I'll share my most recent approach,
> quite possibly this is flawed, but for now it works for me.  Fixes and
> corrections welcome.
> I think these are all the relevant bits below...
>
> ...
> ... other stuff, import rdkit, etc...
> ...
> # fancy foot work to capture RDKIT messages
> from io import StringIO
> from rdkit import rdBase
> rdBase.LogToPythonLogger()
>
> import logging
> logger = logging.getLogger('rdkit')
> logger.setLevel(logging.WARNING) # make explicit though this is the
> default, currently
>
> ... more code...
> ... I'll show some of the context of how I'm using it..  reading an SDF
> file here, and want to trap every message
>
> f_in = open_sdf(INPUT_FILE, 'rb')
> # please don't change the molecule while reading it in
> suppl = Chem.ForwardSDMolSupplier(f_in, sanitize=False,
> removeHs=False, strictParsing=False)
>
> with StringIO() as log_stream:
> log_handler = logging.StreamHandler(log_stream)
> myLogger = logging.getLogger('rdkit')
> myLogger.addHandler(log_handler)
>
> # usually don't pull from a generator this way, but we're
> trying to catch log messages for each molecule individually
> while True:
> try:
> # Python 3
> log_stream.truncate(0)
> log_stream.seek(0)
> curmol = next(suppl)
> except StopIteration:
> break
>
> ... capture everything in the stream from reading the molecule with
> log_stream.getvalue()
> re.sub(r'\[[0-9]{2}:[0-9]{2}:[0-9]{2}[^]]*\]', '',
> log_stream.getvalue()) # remove the timestamps [12:14:19] (maybe could turn
> them off?)
>
> ... after closing SDF file I'm removing the handler
> # remove our log handler
> myLogger.handlers.clear()
>
>
> On Thu, Feb 1, 2024 at 12:49 PM David Cosgrove 
> wrote:
>
>> Hi,
>> I'd like to be able to redirect the various logging streams to files from
>> within the code.  I know that I can turn them off:
>>
>> RDLogger.DisableLog('rdApp.*')
>>
>> but that's more extreme than I want.
>> I have found the function RDLogger.AttachFileToLog() but can't work out
>> how to use it.  The naive
>>
>> RDLogger.AttachFileToLog('rdApp.*', 'logging.file', 1)
>>
>> didn't produce a file anywhere I could see.  I have not been able to find
>> an example of its use.
>>
>> find . -name \*.py -exec grep AttachFileToLog {} \; -print
>>
>> from the top of the source tree produces
>>
>> from rdkit.rdBase import AttachFileToLog, DisableLog, EnableLog,
>> LogMessage
>> ./rdkit/RDLogger.py
>>
>> but the functions don't seem to be used within that file.
>>
>> Any pointers gratefully received.
>> Dave
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinform

[Rdkit-discuss] Redirect error messages to a file in Python?

2024-02-01 Thread David Cosgrove

Hi,
I'd like to be able to redirect the various logging streams to files from
within the code.  I know that I can turn them off:

RDLogger.DisableLog('rdApp.*')

but that's more extreme than I want.
I have found the function RDLogger.AttachFileToLog() but can't work out how
to use it.  The naive

RDLogger.AttachFileToLog('rdApp.*', 'logging.file', 1)

didn't produce a file anywhere I could see.  I have not been able to find
an example of its use.

find . -name \*.py -exec grep AttachFileToLog {} \; -print

from the top of the source tree produces

from rdkit.rdBase import AttachFileToLog, DisableLog, EnableLog, LogMessage
./rdkit/RDLogger.py

but the functions don't seem to be used within that file.

Any pointers gratefully received.
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Varying ring size substructure match.

2023-08-20 Thread David Cosgrove

RDKit supports range queries in SMARTS strings. They are not necessarily
supported by other cheminformatics toolkits such as OpenEye’s. So Wim’s
query below could be simplified by using [r{3-10}] to mean an atom in a
ring of size 3 to 10 atoms in size.

On Sun, 20 Aug 2023 at 18:38, Wim Dehaen  wrote:

> Hi,
> i'm not sure if i understand the question perfectly, so apologies if the
> below is behind the point. i think in general, for analysis like this it is
> better to make use of rdkit's SSSR functionality and then use the ring
> information in the way required for your purpose. this tends to be much
> more flexible and natural.
>
> however, here is a smarts pattern that matches both naphthalene and
> azulene, as both of them are aromatic and are are a ten-membered ring plus
> a single additional closure
> ```patt=Chem.MolFromSmarts("c1c1")
> n=Chem.MolFromSmiles("c12c12")
> a=Chem.MolFromSmiles("c121c2")
> print(n.HasSubstructMatch(patt),a.HasSubstructMatch(patt))```
> this results in True True
>
> another way is to explicitly enumerate the possible ringsizes you are
> willing to consider in the SMARTS:
>
> [r3,r4,r5,r6,r7,r8,r9,r10]1[r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10]1
> as you can see this is much more ugly, but it's able to capture cases such
> as c1cc2c1c3c2c1c3cc1.
>
> best wishes
> wim
>
> On Sun, Aug 20, 2023 at 5:34 PM Eduardo Mayo 
> wrote:
>
>> Hello,
>>
>> I hope you are all doing well. I'm looking for a smart pattern that can
>> match rings of different sizes at the same time. The intention is to match
>> something like naphthalene and azulene with the same pattern. Is that
>> possible?
>>
>> Best,
>> Eduardo
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Chirality wedge disappears in PNG depiction

2023-07-26 Thread David Cosgrove

Hi,
I’m away from my computer at the moment, so can’t try anything, but I
wonder if it’s anything to do with the ‘wedgeBonds=False’ option you gave
when preparing the drawing.
Dave


On Wed, 26 Jul 2023 at 20:45, Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Dear all,
>
> I use the following code to produce PNG drawings. I use RDKit version
> 2023.03.1 .
> The SMILES chain describes a molecule with a single chiral center of
> defined configuration.
>
> from rdkit import Chem
> from rdkit.Chem import rdCoordGen
> from rdkit.Chem.Draw import rdMolDraw2D
> from PIL import Image
> from io import BytesIO
>
> smi  = "C=C[C@@](/C=C/c1ccc(cc1)O)(CCC=C(C)C)C"
> filenameOut = "img.png"
>
> mol = Chem.MolFromSmiles(smi)
> rdCoordGen.AddCoords(mol)
> print(Chem.MolToMolBlock(mol))
> d2d = rdMolDraw2D.MolDraw2DCairo(350, 300)
> dopts = d2d.drawOptions()
> dopts.baseFontSize = 0.6
> dopts.prepareMolsBeforeDrawing = False
> mol_draw = rdMolDraw2D.PrepareMolForDrawing(mol, addChiralHs=False,
> wedgeBonds=False)
> Chem.ReapplyMolBlockWedging(mol_draw)
> d2d.DrawMolecule(mol_draw, legend='', highlightAtoms=[])
> d2d.FinishDrawing()
> bio = BytesIO(d2d.GetDrawingText())
> draw_code = Image.open(bio)
> draw_code.save(filenameOut)
>
> The resulting image does not show the chirality wedge:
>
>
> The script prints the MolBlock that comes from the SMILES and the
> calculation of the 2D atomic coordinates:
> _
>
>  RDKit  2D
>
>  19 19  0  0  0  0  0  0  0  0999 V2000
> 2.05151.22420. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.05151.22140. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.55390.35400. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.44590.35120. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.9435   -0.51620. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.9435   -0.51900. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.4411   -1.38660. C   0  0  0  0  0  0  0  0  0  0  0  0
>-3.4411   -1.38920. C   0  0  0  0  0  0  0  0  0  0  0  0
>-3.9435   -0.52460. C   0  0  0  0  0  0  0  0  0  0  0  0
>-3.44590.34280. C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.44590.34560. C   0  0  0  0  0  0  0  0  0  0  0  0
>-4.9435   -0.52740. O   0  0  0  0  0  0  0  0  0  0  0  0
> 1.4215   -0.14360. C   0  0  0  0  0  0  0  0  0  0  0  0
> 2.28610.35880. C   0  0  0  0  0  0  0  0  0  0  0  0
> 3.1535   -0.13880. C   0  0  0  0  0  0  0  0  0  0  0  0
> 4.01810.36360. C   0  0  0  0  0  0  0  0  0  0  0  0
> 4.01531.36360. C   0  0  0  0  0  0  0  0  0  0  0  0
> 4.8855   -0.13380. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.5567   -0.64600. C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  2  0
>   2  3  1  0
>   3  4  1  0
>   4  5  2  0
>   5  6  1  0
>   6  7  2  0
>   7  8  1  0
>   8  9  2  0
>   9 10  1  0
>  10 11  2  0
>   9 12  1  0
>   3 13  1  0
>  13 14  1  0
>  14 15  1  0
>  15 16  2  0
>  16 17  1  0
>  16 18  1  0
>   3 19  1  1
>  11  6  1  0
> M  END
> _
>
> The wedge bond (3-19) is apparently here but is not drawn as such.
>
> Is there a remedy for that?
>
> Best regards,
>
> Jean-Marc
>
> --
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
>
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> ORCID : -0002-5120-2556
> Tel : +33 (0)3 26 91 82 10
> http://www.univ-reims.fr/icmr
> https://nuzillard.github.io/PyLSD
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Molfile from smiles

2023-05-04 Thread David Cosgrove

--
> *De:* Ling Chan 
> *Enviado:* martes, 2 de mayo de 2023 4:15
> *Para:* Santiago Fraga 
> *Cc:* RDKit Discuss 
> *Asunto:* Re: [Rdkit-discuss] Molfile from smiles
>
> Hello Santiago,
>
> In case you are still looking for an answer, somewhere in my notes I wrote
> the following.
>
> to get a better depiction of complicated topology, do this before
> rendering.
> from rdkit.Chem import rdDepictor
> rdDepictor.SetPreferCoordGen(True)
>
> Sometimes it helps. Good luck.
>
> Ling
>
>
>
> Santiago Fraga  於 2023年4月21日週五 上午2:17寫道：
>
> Good morning
>
>   I am trying to generate a molfile from smiles, using the RDKit
> C++ implementation.
>   But in some cases the result molfile is like the one in the
> attached image.
>
>   My code is something like this:
>
> string molecule =
> "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1";
> RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr);
> mol->updatePropertyCache(false);
> RDDepict::preferCoordGen = true;
> RDDepict::compute2DCoords(*mol);
> string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true)
>
>
>    How could I fix the molfile?
>
> Regards
> Santiago
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-26 Thread David Cosgrove

Thanks for the reference. That sort of bounds screening would probably work
well in the C++ layer for the bulk similarity functions.  My initial
experiments without bounds screening found that doing individual similarity
calculations in Python was a lot slower than the bulk function because
moving from the Python to the C++ layer has quite a large overhead. It
might be worth doing the screening in Python and I will try it but I
suspect that the overhead will cancel out any gain brought by the screening
process.

On Wed, 26 Oct 2022 at 04:17, S Joshua Swamidass 
wrote:

>
> I wonder if there is a way to make use of PyTorch or tensorflow to do this
> on a GPU. That’s where some big speed ups might be found.
>
> Also, consider using these bounds. They do make a big difference in many
> cases.
>
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527184/
>
>
> On Tue, Oct 25, 2022 at 8:57 PM Francois Berenger 
> wrote:
>
>> On 24/10/2022 19:47, David Cosgrove wrote:
>> > For the record, I have attempted this, but got only a marginal
>> > speed-up (130% of CPU used, with any number of threads above 2).  The
>> > procedure I used was to extract the fingerprint pointers into a
>> > std::vector, create a std::vector for the results, unlock the GIL to
>> > do the bulk tanimoto calculation, then re-lock the GIL to copy the
>> > results from the std::vector into the python:list for output.  I guess
>> > the extra overhead to create and populate the additional std::vectors
>> > destroyed any potential speedup.  This was on a vector of 200K
>> > fingerprints, which suggests that the Tanimoto calculation is a small
>> > part of the overall time.  It doesn't seem worth pursuing further.
>>
>> There is probably code on github doing this in parallel already.
>> Think about it: any clustering algorithm using a distance matrix.
>> I guess many people want to initialize the Gram matrix in parallel.
>>
>> I wouldn't be surprised if, for example, chemfp has such code.
>>
>> > Dave
>> >
>> > On Sat, Oct 22, 2022 at 11:28 AM David Cosgrove
>> >  wrote:
>> >
>> >> Hi Greg,
>> >> Thanks for the pointer. I’ll take a look. If it could go in the
>> >> next patch release that would be really useful.
>> >> Dave
>> >>
>> >> On Sat, 22 Oct 2022 at 10:52, Greg Landrum 
>> >> wrote:
>> >>
>> >> Hi Dave,
>> >>
>> >> We have multiple examples of this in the code, here’s one:
>> >>
>> >>
>> >
>> https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40
>> >>
>> >> I’m not sure how this would interact with the call to
>> >> Python::extract that’s in the bulk functions though
>> >>
>> >> It might be better to handle the multithreading on the C++ side by
>> >> adding an optional nThreads argument to  the bulk similarity
>> >> functions. (Though this would have to wait for the next release
>> >> since it’s a feature addition… we can declare releasing the GIL
>> >> as a bug fix)
>> >>
>> >> -greg
>> >>
>> >> On Sat, 22 Oct 2022 at 09:48, David Cosgrove
>> >>  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm doing a lot of tanimoto similarity calculations on large
>> >> datasets using BulkTanimotoSimilarity.  It is an obvious candidate
>> >> for parallelisation, so I am using concurrent.futures to do so.  If
>> >> I use ProcessPoolExectuor, I get good speed-up but each process
>> >> needs a copy of the fingerprint set and for the sizes I'm dealing
>> >> with that uses too much memory.  With ThreadPoolExecutor I only need
>> >> 1 copy of the fingerprints, but the GIL means it only runs on 1
>> >> thread at a time so there's no gain.  Would it be possible to amend
>> >> the C++ BulkTanimotoSimilarity to free the GIL whilst it's doing the
>> >> calculation, and recapture it afterwards?  I understand things like
>> >> numpy do this for some of their functions.  I'm happy to attempt it
>> >> myself if someone who knows about these things can advise that it
>> >> could be done, it would help, and they could provide a few pointers.
>> >>
>> >> Thanks,
>> >> Dave
>> >>
>> >> --
>> >>
>> >> David Cosgrove
>> >> Freelance computational chemistry and

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-26 Thread David Cosgrove

I would be very surprised if speed of fingerprint similarity was the
limiting factor on a distance- matrix-based clustering method. Normally
they are constrained by memory requirements.  In this case I am using the
MaxMin picker in RDKit to generate the cluster “centroids” and am wanting
to fill the clusters with the neighbours to each “centroid”. That is
obviously an embarrassingly parallel calculation, but the size of my
dataset means that using standard Python multi-processing takes up too much
memory because each parallel process needs its own copy of the fingerprint
set. I was wondering if releasing the GIL would make it possible to do it
in multithreaded mode with a single shared fingerprint set but the answer
was that it made little improvement on a single-thread run.


On Wed, 26 Oct 2022 at 02:32, Francois Berenger  wrote:

> On 24/10/2022 19:47, David Cosgrove wrote:
> > For the record, I have attempted this, but got only a marginal
> > speed-up (130% of CPU used, with any number of threads above 2).  The
> > procedure I used was to extract the fingerprint pointers into a
> > std::vector, create a std::vector for the results, unlock the GIL to
> > do the bulk tanimoto calculation, then re-lock the GIL to copy the
> > results from the std::vector into the python:list for output.  I guess
> > the extra overhead to create and populate the additional std::vectors
> > destroyed any potential speedup.  This was on a vector of 200K
> > fingerprints, which suggests that the Tanimoto calculation is a small
> > part of the overall time.  It doesn't seem worth pursuing further.
>
> There is probably code on github doing this in parallel already.
> Think about it: any clustering algorithm using a distance matrix.
> I guess many people want to initialize the Gram matrix in parallel.
>
> I wouldn't be surprised if, for example, chemfp has such code.
>
> > Dave
> >
> > On Sat, Oct 22, 2022 at 11:28 AM David Cosgrove
> >  wrote:
> >
> >> Hi Greg,
> >> Thanks for the pointer. I’ll take a look. If it could go in the
> >> next patch release that would be really useful.
> >> Dave
> >>
> >> On Sat, 22 Oct 2022 at 10:52, Greg Landrum 
> >> wrote:
> >>
> >> Hi Dave,
> >>
> >> We have multiple examples of this in the code, here’s one:
> >>
> >>
> >
> https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40
> >>
> >> I’m not sure how this would interact with the call to
> >> Python::extract that’s in the bulk functions though
> >>
> >> It might be better to handle the multithreading on the C++ side by
> >> adding an optional nThreads argument to  the bulk similarity
> >> functions. (Though this would have to wait for the next release
> >> since it’s a feature addition… we can declare releasing the GIL
> >> as a bug fix)
> >>
> >> -greg
> >>
> >> On Sat, 22 Oct 2022 at 09:48, David Cosgrove
> >>  wrote:
> >>
> >> Hi,
> >>
> >> I'm doing a lot of tanimoto similarity calculations on large
> >> datasets using BulkTanimotoSimilarity.  It is an obvious candidate
> >> for parallelisation, so I am using concurrent.futures to do so.  If
> >> I use ProcessPoolExectuor, I get good speed-up but each process
> >> needs a copy of the fingerprint set and for the sizes I'm dealing
> >> with that uses too much memory.  With ThreadPoolExecutor I only need
> >> 1 copy of the fingerprints, but the GIL means it only runs on 1
> >> thread at a time so there's no gain.  Would it be possible to amend
> >> the C++ BulkTanimotoSimilarity to free the GIL whilst it's doing the
> >> calculation, and recapture it afterwards?  I understand things like
> >> numpy do this for some of their functions.  I'm happy to attempt it
> >> myself if someone who knows about these things can advise that it
> >> could be done, it would help, and they could provide a few pointers.
> >>
> >> Thanks,
> >> Dave
> >>
> >> --
> >>
> >> David Cosgrove
> >> Freelance computational chemistry and chemoinformatics developer
> >> http://cozchemix.co.uk
> >>
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >  --
> >
> > David Cosgrove
> > Freelance computational chemistry and chemoinformatics developer
> > http://cozchemix.co.uk
> >
> > --
> >
> > David Cosgrove
> > Freelance computational chemistry and chemoinformatics developer
> > http://cozchemix.co.uk
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-24 Thread David Cosgrove

For the record, I have attempted this, but got only a marginal speed-up
(130% of CPU used, with any number of threads above 2).  The procedure I
used was to extract the fingerprint pointers into a std::vector, create a
std::vector for the results, unlock the GIL to do the bulk tanimoto
calculation, then re-lock the GIL to copy the results from the std::vector
into the python:list for output.  I guess the extra overhead to create and
populate the additional std::vectors destroyed any potential speedup.  This
was on a vector of 200K fingerprints, which suggests that the Tanimoto
calculation is a small part of the overall time.  It doesn't seem worth
pursuing further.

Dave


On Sat, Oct 22, 2022 at 11:28 AM David Cosgrove 
wrote:

> Hi Greg,
> Thanks for the pointer. I’ll take a look. If it could go in the next patch
> release that would be really useful.
> Dave
>
>
> On Sat, 22 Oct 2022 at 10:52, Greg Landrum  wrote:
>
>>
>> Hi Dave,
>>
>> We have multiple examples of this in the code, here’s one:
>>
>> https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40
>>
>> I’m not sure how this would interact with the call to Python::extract
>> that’s in the bulk functions though
>>
>> It might be better to handle the multithreading on the C++ side by adding
>> an optional nThreads argument to  the bulk similarity functions. (Though
>> this would have to wait for the next release since it’s a feature addition…
>> we can declare releasing the GIL as a bug fix)
>>
>> -greg
>>
>>
>> On Sat, 22 Oct 2022 at 09:48, David Cosgrove 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm doing a lot of tanimoto similarity calculations on large datasets
>>> using BulkTanimotoSimilarity.  It is an obvious candidate for
>>> parallelisation, so I am using concurrent.futures to do so.  If I use
>>> ProcessPoolExectuor, I get good speed-up but each process needs a copy of
>>> the fingerprint set and for the sizes I'm dealing with that uses too much
>>> memory.  With ThreadPoolExecutor I only need 1 copy of the fingerprints,
>>> but the GIL means it only runs on 1 thread at a time so there's no gain.
>>> Would it be possible to amend the C++ BulkTanimotoSimilarity to free the
>>> GIL whilst it's doing the calculation, and recapture it afterwards?  I
>>> understand things like numpy do this for some of their functions.  I'm
>>> happy to attempt it myself if someone who knows about these things can
>>> advise that it could be done, it would help, and they could provide a few
>>> pointers.
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>> --
>>> David Cosgrove
>>> Freelance computational chemistry and chemoinformatics developer
>>> http://cozchemix.co.uk
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-22 Thread David Cosgrove

Hi Greg,
Thanks for the pointer. I’ll take a look. If it could go in the next patch
release that would be really useful.
Dave


On Sat, 22 Oct 2022 at 10:52, Greg Landrum  wrote:

>
> Hi Dave,
>
> We have multiple examples of this in the code, here’s one:
>
> https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40
>
> I’m not sure how this would interact with the call to Python::extract
> that’s in the bulk functions though
>
> It might be better to handle the multithreading on the C++ side by adding
> an optional nThreads argument to  the bulk similarity functions. (Though
> this would have to wait for the next release since it’s a feature addition…
> we can declare releasing the GIL as a bug fix)
>
> -greg
>
>
> On Sat, 22 Oct 2022 at 09:48, David Cosgrove 
> wrote:
>
>> Hi,
>>
>> I'm doing a lot of tanimoto similarity calculations on large datasets
>> using BulkTanimotoSimilarity.  It is an obvious candidate for
>> parallelisation, so I am using concurrent.futures to do so.  If I use
>> ProcessPoolExectuor, I get good speed-up but each process needs a copy of
>> the fingerprint set and for the sizes I'm dealing with that uses too much
>> memory.  With ThreadPoolExecutor I only need 1 copy of the fingerprints,
>> but the GIL means it only runs on 1 thread at a time so there's no gain.
>> Would it be possible to amend the C++ BulkTanimotoSimilarity to free the
>> GIL whilst it's doing the calculation, and recapture it afterwards?  I
>> understand things like numpy do this for some of their functions.  I'm
>> happy to attempt it myself if someone who knows about these things can
>> advise that it could be done, it would help, and they could provide a few
>> pointers.
>>
>> Thanks,
>> Dave
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-22 Thread David Cosgrove

Hi,

I'm doing a lot of tanimoto similarity calculations on large datasets using
BulkTanimotoSimilarity.  It is an obvious candidate for parallelisation, so
I am using concurrent.futures to do so.  If I use ProcessPoolExectuor, I
get good speed-up but each process needs a copy of the fingerprint set and
for the sizes I'm dealing with that uses too much memory.  With
ThreadPoolExecutor I only need 1 copy of the fingerprints, but the GIL
means it only runs on 1 thread at a time so there's no gain.  Would it be
possible to amend the C++ BulkTanimotoSimilarity to free the GIL whilst
it's doing the calculation, and recapture it afterwards?  I understand
things like numpy do this for some of their functions.  I'm happy to
attempt it myself if someone who knows about these things can advise that
it could be done, it would help, and they could provide a few pointers.

Thanks,
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Draw.MolToFile catch programm termination

2022-09-05 Thread David Cosgrove

Thanks. That’s more or less how it looks now. The characters are a bit big,
but I think that’s a consequence of the coordinates in the file.

On Mon, 5 Sep 2022 at 11:51, Rüdiger Lang  wrote:

> Hello David,
>
>
>
> first of all thank you for your work on this great program. Attached is
> the drawing of how we think the mol files should look.
>
>
>
> Best Rüdiger
>
>
>
> *Von:* David Cosgrove 
> *Gesendet:* Montag, 5. September 2022 12:04
> *An:* Rüdiger Lang 
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Betreff:* Re: [Rdkit-discuss] Draw.MolToFile catch programm termination
>
>
>
>
>
> Thanks, Rüdiger.  On my mac it crashes with a segmentation fault, due to
> it using a negative index into a vector.  That's also consistent with
> your exception, since it has taken an unsigned integer below zero which
> wraps round to a very large number.  Disappointingly, it is in my drawing
> code, so I will file it as a bug and attempt to fix it.  Do you have any
> representation of what the drawing is supposed to look like?
>
>
>
> Dave
>
>
>
> On Mon, Sep 5, 2022 at 8:29 AM Rüdiger Lang  wrote:
>
> Hello Dave,
>
>
>
> Thanks! I use the version 2022.03.05 but you are of course right, I had
> attached the wrong mol-file. Sorry for that. But now two files where it
> actually does not work.
>
>
>
> Best Rüdiger
>
>
>
> *Von:* David Cosgrove 
> *Gesendet:* Samstag, 3. September 2022 09:12
> *An:* Rüdiger Lang 
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Betreff:* Re: [Rdkit-discuss] Draw.MolToFile catch programm termination
>
>
>
> Hi Rüdiger,
>
> That file works fine for me using 2022.03.5.  What version are you using?
>
>
>
> Also, I note that your script reads 191131.mol, but the one you have
> provided is 191133.mol.  Can you check that this is the file that is
> causing the problem.
>
>
>
> Dave
>
>
>
>
>
> On Fri, Sep 2, 2022 at 1:40 PM Rüdiger Lang  wrote:
>
> Hello everybody,
>
>
>
> I am trying to convert a large amount of mol files to png. Some mol files
> cause the following error with "Draw.MolToFile":
>
>
>
> <<
>
> Could not convert io integer: 3221225477. Path 'exitCode'.
>
> Der Wert für einen Int32 war zu groß oder klein.
>
> (The value for an Int32 was too large or small.)
>
> >>
>
>
>
> I would like to catch this error to simply omit these files but despite
> "try" the program exits with the above error.
>
> I have attached a problematic mol-file as an example.
>
> The program code looks like this:
>
>
>
> from rdkit import Chem
>
> from rdkit.Chem import Draw
>
> from try_parse.utils import ParseUtils
>
>
>
> Testmol = Chem.MolFromMolFile('C:\\Struct\\191131.mol')
>
>
>
> try:
>
>print('now we test: ')
>
>Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png')
>
># i also tried this:
>
>
> #test=ParseUtils.try_parse_int(Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png'))
>
>#print(test)
>
> except:
>
> print('I never reach this except but the programm stopps')
>
>
>
> Many thanks!
>
>
>
> Rüdiger Lang
>
>
>
>
>
>
> Freundliche Grüße / Kind regards
>
> Rüdiger Lang
> Data Analyst
>
> *abcr GmbH*
> Im Schlehert 10
> <https://www.google.com/maps/search/Im+Schlehert+10+%0D%0A76187+Karlsruhe+%0D%0AGermany?entry=gmail=g>
> 76187 Karlsruhe
> <https://www.google.com/maps/search/Im+Schlehert+10+%0D%0A76187+Karlsruhe+%0D%0AGermany?entry=gmail=g>
> Germany
> <https://www.google.com/maps/search/Im+Schlehert+10+%0D%0A76187+Karlsruhe+%0D%0AGermany?entry=gmail=g>
>
> r.l...@abcr.com
>
>
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
> --
>
> David Cosgrove
>
> Freelance computational chemistry and chemoinformatics developer
>
> http://cozchemix.co.uk
>
>
>
>
> --
>
> David Cosgrove
>
> Freelance computational chemistry and chemoinformatics developer
>
> http://cozchemix.co.uk
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Draw.MolToFile catch programm termination

2022-09-05 Thread David Cosgrove

I have filed a fix that will hopefully appear in the next patch release.
In the meantime, I fear there is no workaround other than restarting the
script after the offending molecule, which will be a nuisance to automate.

Dave


On Mon, Sep 5, 2022 at 11:04 AM David Cosgrove 
wrote:

>
> Thanks, Rüdiger.  On my mac it crashes with a segmentation fault, due to
> it using a negative index into a vector.  That's also consistent with
> your exception, since it has taken an unsigned integer below zero which
> wraps round to a very large number.  Disappointingly, it is in my drawing
> code, so I will file it as a bug and attempt to fix it.  Do you have any
> representation of what the drawing is supposed to look like?
>
> Dave
>
> On Mon, Sep 5, 2022 at 8:29 AM Rüdiger Lang  wrote:
>
>> Hello Dave,
>>
>>
>>
>> Thanks! I use the version 2022.03.05 but you are of course right, I had
>> attached the wrong mol-file. Sorry for that. But now two files where it
>> actually does not work.
>>
>>
>>
>> Best Rüdiger
>>
>>
>>
>> *Von:* David Cosgrove 
>> *Gesendet:* Samstag, 3. September 2022 09:12
>> *An:* Rüdiger Lang 
>> *Cc:* rdkit-discuss@lists.sourceforge.net
>> *Betreff:* Re: [Rdkit-discuss] Draw.MolToFile catch programm termination
>>
>>
>>
>> Hi Rüdiger,
>>
>> That file works fine for me using 2022.03.5.  What version are you using?
>>
>>
>>
>> Also, I note that your script reads 191131.mol, but the one you have
>> provided is 191133.mol.  Can you check that this is the file that is
>> causing the problem.
>>
>>
>>
>> Dave
>>
>>
>>
>>
>>
>> On Fri, Sep 2, 2022 at 1:40 PM Rüdiger Lang  wrote:
>>
>> Hello everybody,
>>
>>
>>
>> I am trying to convert a large amount of mol files to png. Some mol files
>> cause the following error with "Draw.MolToFile":
>>
>>
>>
>> <<
>>
>> Could not convert io integer: 3221225477. Path 'exitCode'.
>>
>> Der Wert für einen Int32 war zu groß oder klein.
>>
>> (The value for an Int32 was too large or small.)
>>
>> >>
>>
>>
>>
>> I would like to catch this error to simply omit these files but despite
>> "try" the program exits with the above error.
>>
>> I have attached a problematic mol-file as an example.
>>
>> The program code looks like this:
>>
>>
>>
>> from rdkit import Chem
>>
>> from rdkit.Chem import Draw
>>
>> from try_parse.utils import ParseUtils
>>
>>
>>
>> Testmol = Chem.MolFromMolFile('C:\\Struct\\191131.mol')
>>
>>
>>
>> try:
>>
>>print('now we test: ')
>>
>>Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png')
>>
>># i also tried this:
>>
>>
>> #test=ParseUtils.try_parse_int(Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png'))
>>
>>#print(test)
>>
>> except:
>>
>> print('I never reach this except but the programm stopps')
>>
>>
>>
>> Many thanks!
>>
>>
>>
>> Rüdiger Lang
>>
>>
>>
>>
>>
>>
>> Freundliche Grüße / Kind regards
>>
>> Rüdiger Lang
>> Data Analyst
>>
>> *abcr GmbH*
>> Im Schlehert 10
>> 76187 Karlsruhe
>> Germany
>>
>> r.l...@abcr.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>>
>> --
>>
>> David Cosgrove
>>
>> Freelance computational chemistry and chemoinformatics developer
>>
>> http://cozchemix.co.uk
>>
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Draw.MolToFile catch programm termination

2022-09-05 Thread David Cosgrove

Thanks, Rüdiger.  On my mac it crashes with a segmentation fault, due to it
using a negative index into a vector.  That's also consistent with
your exception, since it has taken an unsigned integer below zero which
wraps round to a very large number.  Disappointingly, it is in my drawing
code, so I will file it as a bug and attempt to fix it.  Do you have any
representation of what the drawing is supposed to look like?

Dave

On Mon, Sep 5, 2022 at 8:29 AM Rüdiger Lang  wrote:

> Hello Dave,
>
>
>
> Thanks! I use the version 2022.03.05 but you are of course right, I had
> attached the wrong mol-file. Sorry for that. But now two files where it
> actually does not work.
>
>
>
> Best Rüdiger
>
>
>
> *Von:* David Cosgrove 
> *Gesendet:* Samstag, 3. September 2022 09:12
> *An:* Rüdiger Lang 
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Betreff:* Re: [Rdkit-discuss] Draw.MolToFile catch programm termination
>
>
>
> Hi Rüdiger,
>
> That file works fine for me using 2022.03.5.  What version are you using?
>
>
>
> Also, I note that your script reads 191131.mol, but the one you have
> provided is 191133.mol.  Can you check that this is the file that is
> causing the problem.
>
>
>
> Dave
>
>
>
>
>
> On Fri, Sep 2, 2022 at 1:40 PM Rüdiger Lang  wrote:
>
> Hello everybody,
>
>
>
> I am trying to convert a large amount of mol files to png. Some mol files
> cause the following error with "Draw.MolToFile":
>
>
>
> <<
>
> Could not convert io integer: 3221225477. Path 'exitCode'.
>
> Der Wert für einen Int32 war zu groß oder klein.
>
> (The value for an Int32 was too large or small.)
>
> >>
>
>
>
> I would like to catch this error to simply omit these files but despite
> "try" the program exits with the above error.
>
> I have attached a problematic mol-file as an example.
>
> The program code looks like this:
>
>
>
> from rdkit import Chem
>
> from rdkit.Chem import Draw
>
> from try_parse.utils import ParseUtils
>
>
>
> Testmol = Chem.MolFromMolFile('C:\\Struct\\191131.mol')
>
>
>
> try:
>
>print('now we test: ')
>
>Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png')
>
># i also tried this:
>
>
> #test=ParseUtils.try_parse_int(Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png'))
>
>#print(test)
>
> except:
>
> print('I never reach this except but the programm stopps')
>
>
>
> Many thanks!
>
>
>
> Rüdiger Lang
>
>
>
>
>
>
> Freundliche Grüße / Kind regards
>
> Rüdiger Lang
> Data Analyst
>
> *abcr GmbH*
> Im Schlehert 10
> 76187 Karlsruhe
> Germany
>
> r.l...@abcr.com
>
>
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
> --
>
> David Cosgrove
>
> Freelance computational chemistry and chemoinformatics developer
>
> http://cozchemix.co.uk
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Draw.MolToFile catch programm termination

2022-09-03 Thread David Cosgrove

Hi Rüdiger,
That file works fine for me using 2022.03.5.  What version are you using?

Also, I note that your script reads 191131.mol, but the one you have
provided is 191133.mol.  Can you check that this is the file that is
causing the problem.

Dave


On Fri, Sep 2, 2022 at 1:40 PM Rüdiger Lang  wrote:

> Hello everybody,
>
>
>
> I am trying to convert a large amount of mol files to png. Some mol files
> cause the following error with "Draw.MolToFile":
>
>
>
> <<
>
> Could not convert io integer: 3221225477. Path 'exitCode'.
>
> Der Wert für einen Int32 war zu groß oder klein.
>
> (The value for an Int32 was too large or small.)
>
> >>
>
>
>
> I would like to catch this error to simply omit these files but despite
> "try" the program exits with the above error.
>
> I have attached a problematic mol-file as an example.
>
> The program code looks like this:
>
>
>
> from rdkit import Chem
>
> from rdkit.Chem import Draw
>
> from try_parse.utils import ParseUtils
>
>
>
> Testmol = Chem.MolFromMolFile('C:\\Struct\\191131.mol')
>
>
>
> try:
>
>print('now we test: ')
>
>Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png')
>
># i also tried this:
>
>
> #test=ParseUtils.try_parse_int(Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png'))
>
>#print(test)
>
> except:
>
> print('I never reach this except but the programm stopps')
>
>
>
> Many thanks!
>
>
>
> Rüdiger Lang
>
>
>
>
>
>
> Freundliche Grüße / Kind regards
>
> Rüdiger Lang
> Data Analyst
>
> *abcr GmbH*
> Im Schlehert 10
> 76187 Karlsruhe
> Germany
>
> r.l...@abcr.com
>
>
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Draw.MolToFile catch programm termination

2022-09-02 Thread David Cosgrove

Hi Rüdiger,
It looks like it has exited rather than raising an exception. I would
interpret that as it being very unimpressed with the molecule. I’ll take a
look at it tomorrow morning.
Dave



On Fri, 2 Sep 2022 at 13:40, Rüdiger Lang  wrote:

> Hello everybody,
>
>
>
> I am trying to convert a large amount of mol files to png. Some mol files
> cause the following error with "Draw.MolToFile":
>
>
>
> <<
>
> Could not convert io integer: 3221225477. Path 'exitCode'.
>
> Der Wert für einen Int32 war zu groß oder klein.
>
> (The value for an Int32 was too large or small.)
>
> >>
>
>
>
> I would like to catch this error to simply omit these files but despite
> "try" the program exits with the above error.
>
> I have attached a problematic mol-file as an example.
>
> The program code looks like this:
>
>
>
> from rdkit import Chem
>
> from rdkit.Chem import Draw
>
> from try_parse.utils import ParseUtils
>
>
>
> Testmol = Chem.MolFromMolFile('C:\\Struct\\191131.mol')
>
>
>
> try:
>
>print('now we test: ')
>
>Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png')
>
># i also tried this:
>
>
> #test=ParseUtils.try_parse_int(Draw.MolToFile(Testmol,'C:\\Struct\\Test191131.png'))
>
>#print(test)
>
> except:
>
> print('I never reach this except but the programm stopps')
>
>
>
> Many thanks!
>
>
>
> Rüdiger Lang
>
>
>
>
>
>
> Freundliche Grüße / Kind regards
>
> Rüdiger Lang
> Data Analyst
>
> *abcr GmbH*
> Im Schlehert 10
> <https://www.google.com/maps/search/Im+Schlehert+10+%0D%0A76187+Karlsruhe+%0D%0AGermany?entry=gmail=g>
> 76187 Karlsruhe
> <https://www.google.com/maps/search/Im+Schlehert+10+%0D%0A76187+Karlsruhe+%0D%0AGermany?entry=gmail=g>
> Germany
> <https://www.google.com/maps/search/Im+Schlehert+10+%0D%0A76187+Karlsruhe+%0D%0AGermany?entry=gmail=g>
>
> r.l...@abcr.com
>
>
>
>
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Using DrawAttachmentLine for bidentate ligands

2022-08-14 Thread David Cosgrove

Hi Geoff,
You should bear in mind that the dative bond syntax is an RDKIt extension
to SMILES so is not guaranteed to be parsed correctly by other
cheminformatics toolkits.
Dave


On Sun, 14 Aug 2022 at 00:42, Geoffrey Hutchison 
wrote:

> mol = Chem.MolFromSmiles("C12=CC=CC=[N]1->[*]<-[N]3=C2C=CC=C3")
>
>
> Hmm. I forgot the SMILES syntax of dative bonds. That's a nice idea.
>
> I actually decided to use the noFreeType=True option, add a highlight as
> the "metal" and remove the * from the depiction.
>
> The initial set is at:
> https://github.com/OpenChemistry/fragments/tree/main/ligands
>
> Thanks!
> -Geoff
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Using DrawAttachmentLine for bidentate ligands

2022-08-11 Thread David Cosgrove

Hi Geoff,
The drawer has the GetDrawCoords() method.  There are 2 overloads, one
takes a Point2D, the other an atom index, and both return the coordinates
in the drawers reference frame.  Assuming you're working in Python.  In
C++, they're getDrawCoords().
https://www.rdkit.org/docs/source/rdkit.Chem.Draw.rdMolDraw2D.html

Dave


On Wed, Aug 10, 2022 at 4:22 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> I've been using RDKit for depicting sets of ligands from SMILES, which has
> been great.
>
> I'd like to add some bidentate and tridentate ligands. Let's stick to
> bipyridine at first (see image)
>
>
> I have the appropriate SMILES, leaving * as part of a 5 atom ring
> involving the nitrogen atoms (and sanitize = False)
> C12=CC=CC=[N]1[*][N]3=C2C=CC=C3
>
> The resulting depiction is great .. except I'd like to add the attachment
> "squiggles" across the N-* bonds.
>
> I see there are DrawAttachmentLine and DrawWavyLine methods, but I'd need
> to get the positions of the N and * atoms.
>
> What's the best way to do that and / or automate adding wavy lines across
> arbitrary bonds?
>
> Thanks,
> -Geoff
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Color bonds with value

2022-07-06 Thread David Cosgrove

If hacking the SVG isn't to your taste, you can pass into DrawMolecule a
list of bonds for highlighting, and the colour of highlight for each bond.
This will put a coloured band alongside each bond.  Also,
DrawMoleculeWithHighlights allows you to put multiple coloured highlights
for each bond, so you could colour code the highlights with more than 1
property, and this method also allows you to alter the width of the
highlights if you wish.  Coming in time for the next release, I am planning
to implement a highlighting method that just changes the colour of the
bonds, and with that and the first method you will be able to do what you
want directly.  And yes, Christian, at the same time I will implement your
highlight by lasso method that you've been waiting some time for!

Dave


On Tue, Jul 5, 2022 at 9:21 PM Paolo Tosco 
wrote:

> Hi Joey,
>
> not sure if by "color" you mean text labelling or actually mapping a
> property to a color.
> Anyway, here's some code for either use case.
> The text labelling is easy, the individual bond coloring can be done by
> fiddling with the SVG text.
>
> import re
> import xml.etree.ElementTree as ET
> import colorsys
> from rdkit import Chem
> from rdkit.Chem.Draw import rdMolDraw2D, rdDepictor
> from IPython.display import SVG
>
> mol =
> Chem.MolFromSmiles('CN1C(=O)C2=C(ON=C2c2c2Cl)C(Cl)=C1c1cnc(N2CCC(C)(N)CC2)nn1')
> rdDepictor.Compute2DCoords(mol)
> rdDepictor.NormalizeDepiction(mol)
> rdDepictor.StraightenDepiction(mol)
>
> # this is to assign a text label to each bond
> for b in mol.GetBonds():
> bond_prop = int(b.GetBondType())
> b.SetIntProp("bondNote", bond_prop)
>
> drawer = rdMolDraw2D.MolDraw2DSVG(600, 300)
> drawer.DrawMolecule(mol)
> drawer.FinishDrawing()
> svg = drawer.GetDrawingText()
>
> SVG(svg)
> [image: image.png]
> # this is to assign a color to each bond
> for b in mol.GetBonds():
> b.ClearProp("bondNote")
>
> int_bond_types = sorted(map(int, Chem.BondType.values))
> min_bond_type = int_bond_types[0]
> max_bond_type = int_bond_types[-1]
> bond_type_range = max_bond_type - min_bond_type
>
> def bond_type_to_rgb(bt):
> rgb = colorsys.hsv_to_rgb((int(bt) - min_bond_type) / bond_type_range,
> .8, .8)
> return "#" + "".join("{0:02x}".format(round(c * 255.0)) for c in rgb)
>
> bond_colors = [bond_type_to_rgb(b.GetBondType()) for b in mol.GetBonds()]
> drawer = rdMolDraw2D.MolDraw2DSVG(600, 300)
> drawer.drawOptions().bondLineWidth = 3
> drawer.DrawMolecule(mol)
> drawer.FinishDrawing()
>
> def set_bond_colors(svg_text, bond_colors):
> path_class_regex = re.compile(r"bond-(\d+) atom-(\d+) atom-(\d+)")
> path_style_regex = re.compile(r"^(.*stroke:)(#[0-9A-F]{6})(;.*)$")
> svg_tree = ET.fromstring(svg_text)
> for path in svg_tree.findall("{http://www.w3.org/2000/svg}path;):
> path_class = path.get("class")
> if not path_class:
> continue
> m = path_class_regex.match(path_class)
> if not m:
> continue
> bi = int(m.group(1))
> if bi >= len(bond_colors):
> continue
> path_style = path.get("style")
> if not path_style:
> continue
> path_style =
> path_style_regex.sub(f"\\g<1>{bond_colors[bi]}\\g<3>", path_style)
> path.set("style", path_style)
> return ET.tostring(svg_tree)
>
> svg_text = drawer.GetDrawingText()
> svg_text = set_bond_colors(svg_text, bond_colors)
> SVG(svg_text)
> [image: 1fa303ba-bc21-489c-af57-5324e07d7cb5.png]
>
> Hope this helps, cheers
> p.
>
> On Tue, Jul 5, 2022 at 5:12 PM Storer, Joey (J) via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> Hi all,
>>
>>
>>
>> I would like to color all bonds with a value.  Does anyone have a snippet
>> for this?
>>
>>
>>
>> Many thanks!
>>
>> Joey Storer
>>
>> Dow Inc.
>>
>> Core R
>>
>> General Business
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Multi-line atom label

2022-05-15 Thread David Cosgrove

Hi,
At present, the drawing code will ignore things like '\n'.  I suppose it
could be made to recognise newline and produce 2 lines of output.  I will
add it to the list of things to do on a wet weekend, though it won't appear
in a release until 2022.09.1 at the earliest.  In the meantime, you could
try adding the labels explicitly to the drawing, using ideas in this
discussion:
https://github.com/rdkit/rdkit/discussions/4832#discussioncomment-1882386
That shows how to label atoms and bonds with torsion angles and bond
lengths.

HTH,
Dave

On Fri, May 13, 2022 at 2:46 PM Márton Vass 
wrote:

> Hi,
> Is it possible to have a multi-line atom label?
> I've tried adding ...\n... or .. to the atomNote property,
> but these didn't work (though  and  work).
>
> mol = Chem.MolFromSmiles("CCC")
> mol.GetAtomWithIdx(0).SetProp('atomNote',"line1\nline2")
> Chem.Draw.MolToFile(mol,"/Users/vass/Downloads/x.svg")
>
> Thanks,
> Marton
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Annotations get trimmed on molecule renderings

2022-05-06 Thread David Cosgrove

Hi Giammy,
You're right, the new pictures look pretty rubbish.  I assume it's related
to https://github.com/rdkit/rdkit/discussions/5195.  I'll fix it over the
weekend, and hopefully it'll show up in the next patch release.
Dave


On Fri, May 6, 2022 at 1:07 PM Gianmarco Ghiandoni 
wrote:

> Hi Dave,
>
> Thanks for your reply. The reason why my library sticks to 2021.09 is
> because I get even more trouble with later versions of RDKit. These are two
> examples of rendering with 2021 and 2022:
>
> [image: image.png][image: image.png]
>
> The good news is that your padding suggestion works, so I set
> d2d.drawOptions().padding = 0.15 and voilá:
>
> [image: image.png]
>
> Amazing. Thanks!
> Giammy
>
> On Thu, 5 May 2022 at 10:32, David Cosgrove 
> wrote:
>
>> Hi Giammy,
>>
>> On reflection overnight, you might try d2d.drawOptions().padding = 0.2 or
>> something.  That should increase the amount of empty space around the
>> molecule (the default is 0.05, and it's the fraction of the width/height of
>> the image) such that there's enough room to show the whole annotation.
>> It's a bit of a kludge, but it might work.
>>
>> Dave
>>
>> On Wed, May 4, 2022 at 4:20 PM Gianmarco Ghiandoni 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am using rdkit_pypi==2021.9.4 to generate visualisation of compounds
>>> with their atomic hydrogen bond strengths. In particular, I am using this
>>> function to produce an SVG string:
>>>
>>> d2d = rdMolDraw2D.MolDraw2DSVG(fig_size[0], fig_size[1])
>>> d2d.drawOptions().annotationFontScale = 0.7
>>> d2d.DrawMolecule(
>>> rwmol,
>>> highlightAtoms=atoms_to_highlight,
>>> highlightAtomColors=idx2rgb,
>>> highlightBonds=None,
>>> )
>>> d2d.FinishDrawing()
>>> return d2d.GetDrawingText()
>>>
>>> Note that I am increasing the font scale to from 0.5 to 0.7 and for
>>> certain molecules that produces renderings where annotations are cut out:
>>>
>>> [image: image.png]
>>>
>>> Any suggestions on how to fix this?
>>>
>>> Thanks,
>>>
>>> Giammy
>>> --
>>> *Gianmarco*
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>>
>
> --
> *Gianmarco*
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Annotations get trimmed on molecule renderings

2022-05-05 Thread David Cosgrove

Hi Giammy,

On reflection overnight, you might try d2d.drawOptions().padding = 0.2 or
something.  That should increase the amount of empty space around the
molecule (the default is 0.05, and it's the fraction of the width/height of
the image) such that there's enough room to show the whole annotation.
It's a bit of a kludge, but it might work.

Dave

On Wed, May 4, 2022 at 4:20 PM Gianmarco Ghiandoni 
wrote:

> Hi all,
>
> I am using rdkit_pypi==2021.9.4 to generate visualisation of compounds
> with their atomic hydrogen bond strengths. In particular, I am using this
> function to produce an SVG string:
>
> d2d = rdMolDraw2D.MolDraw2DSVG(fig_size[0], fig_size[1])
> d2d.drawOptions().annotationFontScale = 0.7
> d2d.DrawMolecule(
> rwmol,
> highlightAtoms=atoms_to_highlight,
> highlightAtomColors=idx2rgb,
> highlightBonds=None,
> )
> d2d.FinishDrawing()
> return d2d.GetDrawingText()
>
> Note that I am increasing the font scale to from 0.5 to 0.7 and for
> certain molecules that produces renderings where annotations are cut out:
>
> [image: image.png]
>
> Any suggestions on how to fix this?
>
> Thanks,
>
> Giammy
> --
> *Gianmarco*
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Annotations get trimmed on molecule renderings

2022-05-04 Thread David Cosgrove

Hi Giammy,

I’d be pretty disappointed if that happened in 2022.03.[12]. I don’t think
there’s a fix for that in 2021.09.  If it still happens in the latest
version please file it as a bug and I’ll take a look at it. The molecule
rendering was completely overhauled over the winter to try and prevent that
sort of thing.

Regards,
Dave


On Wed, 4 May 2022 at 16:20, Gianmarco Ghiandoni 
wrote:

> Hi all,
>
> I am using rdkit_pypi==2021.9.4 to generate visualisation of compounds
> with their atomic hydrogen bond strengths. In particular, I am using this
> function to produce an SVG string:
>
> d2d = rdMolDraw2D.MolDraw2DSVG(fig_size[0], fig_size[1])
> d2d.drawOptions().annotationFontScale = 0.7
> d2d.DrawMolecule(
> rwmol,
> highlightAtoms=atoms_to_highlight,
> highlightAtomColors=idx2rgb,
> highlightBonds=None,
> )
> d2d.FinishDrawing()
> return d2d.GetDrawingText()
>
> Note that I am increasing the font scale to from 0.5 to 0.7 and for
> certain molecules that produces renderings where annotations are cut out:
>
> [image: image.png]
>
> Any suggestions on how to fix this?
>
> Thanks,
>
> Giammy
> --
> *Gianmarco*
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-07 Thread David Cosgrove

Glad it works for you. As Greg pointed out to someone else today, it’s
marginally more efficient to do [#6] than [C,c] and likewise for nitrogen.
But it’s always a trade off between speed and legibility/maintainability.
If speed is of the essence and you’re running on millions of compounds it
might be worth trying.

On Mon, 7 Mar 2022 at 20:45, Adam Moyer  wrote:

> Ahh! Thank you so much, to both of you.
>
> Yes, the different meaning of H in the various contexts was tripping me up.
>
> Also, DescribeQuery() was definitely a function that I needed for
> debugging this solo. Thank you. I will keep that in mind in the future.
>
> I found that this smiles (S4) is exactly what I
> needed: '[C,c]1(Cl)[C,c][C,c]([N,n,C,c,#1])[C,c]([C,c])[C,c]([#1])[C,c]1'.
>
> Cheers,
> Adam
>
> On Tue, Mar 1, 2022 at 4:32 AM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> A minor correction: [H] by itself *is* valid and means a hydrogen atom.
>> The Daylight docs say as much in section 4.1. But in other contexts it
>> means a hydrogen count, so to be safe, always using #1 to mean a hydrogen
>> atom can be a good practice.
>>
>> If you are ever in doubt about how RDKit is interpreting a SMARTS,
>> I recommend making use of the DescribeQuery function which provides a tree
>> representation of a query atom or bond. For example (comments added):
>>
>> >>> mol = Chem.MolFromSmarts('[H][N,H][N,#1]')
>>
>>
>> >>> print(mol.GetAtomWithIdx(0).DescribeQuery())  # [H]
>>
>> AtomAtomicNum 1 = val  # [H] interpreted as a hydrogen atom
>>
>> >>> print(mol.GetAtomWithIdx(1).DescribeQuery())  # [N,H]
>>
>> AtomOr
>>   AtomType 7 = val
>>   AtomHCount 1 = val  # H interpreted as a hydrogen count
>> # Overall query atom means "an aliphatic nitrogen OR (any atom with one
>> hydrogen)!
>>
>> >>> print(mol.GetAtomWithIdx(2).DescribeQuery())   # [N,#1]
>>
>> AtomOr
>>   AtomType 7 = val
>>   AtomAtomicNum 1 = val  # "#1" is atomic number, therefore a hydrogen
>> atom.
>> # Overall query atom means "an aliphatic nitrogen OR a hydrogen"
>>
>> One non-obvious convention in the DescribeQuery output is that AtomType
>> implies aliphatic when the value is a normal atomic number, or aromatic if
>> the atomic number is offset by 1000. For example, [n] is "AtomType 1007".
>>
>> Hope you find this approach useful in the future.
>>
>> Ivan
>>
>> On Tue, Mar 1, 2022 at 6:33 AM David Cosgrove 
>> wrote:
>>
>>> Hi Adam
>>> There are a number of issues here.  The key one, I think, is a
>>> misunderstanding about the meaning of H in SMARTS.  It means "a single
>>> attached hydrogen", and is a qualifier for another atom, it cannot be used
>>> by itself.  So [*H] is valid, [H] isn't.  See the table at
>>> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.  If you
>>> want to refer to an explicit hydrogen, you have to use [#1].  However, that
>>> will only match an explicit hydrogen in the molecule, not an implicit one.
>>> Thus c[#1] doesn't match anything in c1c1.  If you have read in a
>>> molecule from a molfile, for example, that has explicit hydrogens then you
>>> will be ok.
>>>
>>> Further to that, your SMARTS strings, at least as they have appeared in
>>> gmail, which may have garbled them, are incorrect.  In S1, the brackets
>>> round [N,n,H] make it a substituent, so it will not match the indole
>>> nitrogen.  Also, it would probably be better as [N,n;H], which would be
>>> read as "(aliphatic nitrogen OR aromatic nitrogen) AND 1 attached
>>> hydrogen."  The [N,n,H] will match a methylated indole nitrogen which I
>>> imagine is not what you want. Similar remarks apply to S2.  A SMARTS that
>>> matches both 6CI and PCT
>>> is [C,c]1(Cl)[C,c][C,c;H][C,c]([C,c])[C,c;H][C,c]1, but that won't match
>>> the H atoms themselves if you want to use them in the overlay, and it also
>>> won't work in the aliphatic case of, for example, ClC1CCC(C)CC1 because
>>> there the carbon atoms have 2 attached hydrogens.   If you really do want
>>> it to match aliphatic cases as well, then you will need something
>>> like 
>>> [C,c]1(Cl)[$([CH2]),$([cH])][$([CH2]),$([cH])][C,c]([C,c])[$([CH2]),$([cH])][$([CH2]),$([cH])]1
>>> which is quite a mouthful.  The carbons at the 2,3,5 and 6 positions on the
>>> ring are specified as either [CH2] or [cH].
>>>
>>> Jupyter note

Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-01 Thread David Cosgrove

Hi Adam
There are a number of issues here.  The key one, I think, is a
misunderstanding about the meaning of H in SMARTS.  It means "a single
attached hydrogen", and is a qualifier for another atom, it cannot be used
by itself.  So [*H] is valid, [H] isn't.  See the table at
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.  If you
want to refer to an explicit hydrogen, you have to use [#1].  However, that
will only match an explicit hydrogen in the molecule, not an implicit one.
Thus c[#1] doesn't match anything in c1c1.  If you have read in a
molecule from a molfile, for example, that has explicit hydrogens then you
will be ok.

Further to that, your SMARTS strings, at least as they have appeared in
gmail, which may have garbled them, are incorrect.  In S1, the brackets
round [N,n,H] make it a substituent, so it will not match the indole
nitrogen.  Also, it would probably be better as [N,n;H], which would be
read as "(aliphatic nitrogen OR aromatic nitrogen) AND 1 attached
hydrogen."  The [N,n,H] will match a methylated indole nitrogen which I
imagine is not what you want. Similar remarks apply to S2.  A SMARTS that
matches both 6CI and PCT
is [C,c]1(Cl)[C,c][C,c;H][C,c]([C,c])[C,c;H][C,c]1, but that won't match
the H atoms themselves if you want to use them in the overlay, and it also
won't work in the aliphatic case of, for example, ClC1CCC(C)CC1 because
there the carbon atoms have 2 attached hydrogens.   If you really do want
it to match aliphatic cases as well, then you will need something
like 
[C,c]1(Cl)[$([CH2]),$([cH])][$([CH2]),$([cH])][C,c]([C,c])[$([CH2]),$([cH])][$([CH2]),$([cH])]1
which is quite a mouthful.  The carbons at the 2,3,5 and 6 positions on the
ring are specified as either [CH2] or [cH].

Jupyter notebook can be really useful for debugging SMARTS patterns like
this.  The one I used was variations of
```
from rdkit import Chem
from IPython.display import SVG
mol = Chem.MolFromSmiles('C1=CC(=CC2=C1C=CN2C)Cl')
qmol = Chem.MolFromSmarts('[C,c]1(Cl)[C,c][C,c][C,c]([C,c])[C,c][C,c]1')
print(mol.GetSubstructMatches(qmol))
mol
```
which prints the numbers of the matching atoms and also draws the molecule
with the match highlighted.
Regards,
Dave

On Tue, Mar 1, 2022 at 1:43 AM Adam Moyer  wrote:

> Hello,
>
> I have a baffling case where I am trying to match substructures on two
> ligands for the goal of aligning them.
>
> I have two ligands; one is a 6-chloroindole (6CI) and the other is a
> para-chloro toluene (PCT).
>
> I am attempting to use the following SMARTS (S1) to match
> them: '[C,c]1(Cl)[C,c][C,c]*([N,n,H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'.
> For some reason S1 only finds a match in 6CI.
>
> When I use the following SMARTS (S2) I only match to PCT as expected:
> '[C,c]1(Cl)[C,c][C,c]*([H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'.
>
> How can S1 not match PCT? S1 is strictly a superset of S2 because I am
> using the "or" operation. Do I have a misunderstanding of how explicit
> hydrogens work in RDKit/SMARTS?
>
> Lastly when I use the last SMARTS (S3) I am able to match to both, but I
> cannot use that smarts due to other requirements in my
> project: '[C,c]1(Cl)[C,c][C,c][C,c]([C,c,H])[C,c]([H])[C,c]1'
>
> Thanks!
> Adam
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Font size when drawing molecules

2022-02-09 Thread David Cosgrove

To help further, I'm just implementing an option
drawOptions().fixedFontSize to allow you to insist on a font size, in
pixels.  I will remember to expose it to Python!
Layout is a different department, I'm afraid I can't help there.  It would
probably be better to start a new thread with that, so as to catch the
attention of the right people.


On Wed, Feb 9, 2022 at 3:58 PM Tim Dudgeon  wrote:

> Thanks Dave. Understood.
> A related question - is it possible to make the layout aware of the amount
> of space that is available? I'm stuck with a very wide and short aspect
> ratio and it would be helpful if the layout engine could optimise the
> layout to fit in that unconventional space.
> Tim
>
> On Wed, Feb 9, 2022 at 11:40 AM David Cosgrove 
> wrote:
>
>> Hi Tim,
>> Sorry, the font size setting both within the code and in the public
>> interface has been a fraught matter since Freetype was introduced for the
>> font drawing and it isn't currently as controllable as one might wish.  The
>> font is chosen based on baseFontSize and the drawing scale.  The size of
>> the font relative to the bond lengths is therefore fixed, unless it hits
>> the minFontSize or maxFontSize.  So for a large molecule in a small canvas,
>> it is likely that the font size will be larger relative to the bonds as
>> minFontSize has an effect, and vice versa with a small molecule in a large
>> canvas.  To achieve what you want, you need to increase bastFontSize,
>> which, as Paolo mentioned, isn't currently exposed to Python.  Apologies
>> for that, which was an oversight.  It does work with the current release,
>> though, so if you don't mind rebuilding RDKit you can use it now.
>> Add
>> ```
>>
>>   .def_readwrite(
>>
>>   "baseFontSize", ::MolDrawOptions::baseFontSize,
>>
>>   "relative size of font.  Defaults to 0.6.  -1 means use
>> default.")
>> ```
>> to $RDBASE/Code/GraphMol/MolDraw2D/Wrap/rdMolDraw2D.cpp immediately after
>> the analogous minFontSize entry
>> HTH,
>> Dave
>>
>>
>>
>> On Wed, Feb 9, 2022 at 10:31 AM Tim Dudgeon 
>> wrote:
>>
>>> OK, thanks. That's great to hear.
>>> In the meantime could someone explain how the font is currently chosen?
>>> e.g. if I specify 10 as min and 14 as max what is actually used?
>>> Tim
>>>
>>> On Wed, Feb 9, 2022 at 10:11 AM Paolo Tosco 
>>> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> Dave Cosgrove is currently working at a PR which, among other things,
>>>> addresses exactly the need that you describe through the baseFontSize
>>>> parameter, which is currently not exposed to Python. The PR is almost ready
>>>> for merging and it should become part of the March release.
>>>>
>>>> Cheers,
>>>> p.
>>>>
>>>> On Wed, Feb 9, 2022 at 10:57 AM Tim Dudgeon 
>>>> wrote:
>>>>
>>>>> I'm confused over how the font is chosen when drawing molecules.
>>>>> There are MolDrawOptions.minFontSize and MolDrawOptions.maxFontSize
>>>>> properties, and if I set them to the same value then that sized font is
>>>>> used. But if I set max to a larger size than min then it's not clear what
>>>>> font size will be used.
>>>>> I'm wanting the font size to adapt to the amount the molecule is
>>>>> scaled to fit the space (larger molecules needing a smaller font) but I
>>>>> want the font size that is used to be a bit bigger than the default
>>>>> that would be used if I don't set anything.
>>>>> How do I go about this?
>>>>> Thanks
>>>>> Tim
>>>>> ___
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Font size when drawing molecules

2022-02-09 Thread David Cosgrove

Hi Tim,
Sorry, the font size setting both within the code and in the public
interface has been a fraught matter since Freetype was introduced for the
font drawing and it isn't currently as controllable as one might wish.  The
font is chosen based on baseFontSize and the drawing scale.  The size of
the font relative to the bond lengths is therefore fixed, unless it hits
the minFontSize or maxFontSize.  So for a large molecule in a small canvas,
it is likely that the font size will be larger relative to the bonds as
minFontSize has an effect, and vice versa with a small molecule in a large
canvas.  To achieve what you want, you need to increase bastFontSize,
which, as Paolo mentioned, isn't currently exposed to Python.  Apologies
for that, which was an oversight.  It does work with the current release,
though, so if you don't mind rebuilding RDKit you can use it now.
Add
```

  .def_readwrite(

  "baseFontSize", ::MolDrawOptions::baseFontSize,

  "relative size of font.  Defaults to 0.6.  -1 means use default.")
```
to $RDBASE/Code/GraphMol/MolDraw2D/Wrap/rdMolDraw2D.cpp immediately after
the analogous minFontSize entry
HTH,
Dave

On Wed, Feb 9, 2022 at 10:31 AM Tim Dudgeon  wrote:

> OK, thanks. That's great to hear.
> In the meantime could someone explain how the font is currently chosen?
> e.g. if I specify 10 as min and 14 as max what is actually used?
> Tim
>
> On Wed, Feb 9, 2022 at 10:11 AM Paolo Tosco 
> wrote:
>
>> Hi Tim,
>>
>> Dave Cosgrove is currently working at a PR which, among other things,
>> addresses exactly the need that you describe through the baseFontSize
>> parameter, which is currently not exposed to Python. The PR is almost ready
>> for merging and it should become part of the March release.
>>
>> Cheers,
>> p.
>>
>> On Wed, Feb 9, 2022 at 10:57 AM Tim Dudgeon 
>> wrote:
>>
>>> I'm confused over how the font is chosen when drawing molecules.
>>> There are MolDrawOptions.minFontSize and MolDrawOptions.maxFontSize
>>> properties, and if I set them to the same value then that sized font is
>>> used. But if I set max to a larger size than min then it's not clear what
>>> font size will be used.
>>> I'm wanting the font size to adapt to the amount the molecule is scaled
>>> to fit the space (larger molecules needing a smaller font) but I want the
>>> font size that is used to be a bit bigger than the default that would be
>>> used if I don't set anything.
>>> How do I go about this?
>>> Thanks
>>> Tim
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] problem with latest bulds?

2022-01-26 Thread David Cosgrove

Hi Tim,
I built from master a couple of hours ago on a Ubuntu 20 system without
problems.
Dave

On Wed, 26 Jan 2022 at 13:58, Tim Dudgeon  wrote:

> I'm building RDKit from the latest code on the master branch.
> The build is fine, but the python bindings seem broken:
>
> >>> from rdkit import Chem
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/data/github/rdkit/rdkit/rdkit/__init__.py", line 6, in 
> from . import rdBase
> ImportError: cannot import name 'rdBase' from partially initialized module
> 'rdkit' (most likely due to a circular import)
> (/data/github/rdkit/rdkit/rdkit/__init__.py)
>
> Are others seeing this too?
>
> Tim
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] RDKit_minimal.js running out of memory

2022-01-25 Thread David Cosgrove

Hi Adam,
You need to delete the molecule once you’re done with it. It is beyond the
reach of the JS garbage collector.  Add ‘mol.delete()’ at the end of the
loop to free the memory for the next round.
HTH,
Dave


On Tue, 25 Jan 2022 at 20:46, Ádám Baróthi  wrote:

> Hello everyone,
>
> I'm currently experimenting with the Javascript wrapper for RDKit,
> thinking that I could offload image generation and descriptor calculations
> to the client.
> I have a list of 1555 SMILES right now that I'm trying to convert to mol
> objects with RDKit.get_mol(), but after successfully processing about
> 700-800 molecules I get an "abort(OOM)" exception. Any further calls to
> get_mol() throw the same exception. The code is fairly basic:
>
> molecules = []
> for (smi,idx of smiles_list.entries()) {
> mol = RDKit.get_mol(smi);
> //image = mol.get_svg();
> //descriptors = JSON.parse(mol.get_descriptors());
> //molecules[idx] = {'image': image, 'desc': descriptors};
> }
>
> With this, I could process about 830 molecules.
> If I uncomment the lines, I could process about 700.
>
> I'm fairly new to JS development, so I'm not really sure what I'm doing
> wrong.
>
> Best,
> Adam
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Taylor-Butina clustering

2021-07-21 Thread David Cosgrove

Hi Francesca,

The Taylor-Butina clustering is not hierarchical.  It is a type of sphere
exclusion algorithm.  A useful image for the results would be the
"centroid" of each cluster, possibly followed by the other cluster
members.  You will need to generate the images from the original input
molecules, not the fingerprints.   You'll need to write some extra code to
read the clusters and do this.  The Getting Started document (
https://www.rdkit.org/docs/GettingStartedInPython.html) should help you
with the image generation.  Technically, the centroids aren't proper
centroids, they are the molecules that each cluster is based on.  The true
centroid would be some sort of average of the fingerprints of the molecules
in the cluster, which itself would not be a molecule.  Dealing with false
singletons is a matter of taste, as they are an artifact of the
clustering method.  One way I have had success with in the past is to
define a second, looser, similarity threshold and put each false singleton
into the cluster whose centroid it is most similar to, so long as it is
within this new threshold.  False singletons are certainly more common than
true ones in my experience.
The threshold you use for the clustering should be chosen with some care,
and will depend on the fingerprint type more than anything else.  Greg did
a blog post recently (
https://greglandrum.github.io/rdkit-blog/similarity/reference/2021/05/26/similarity-threshold-observations1.html)
on selecting a threshold for similarity searching, and those suggestions
are probably a good place to start with for this, too.

Best,
Dave

On Wed, Jul 21, 2021 at 8:58 AM Francesca Magarotto -
francesca.magarot...@studio.unibo.it 
wrote:

> Hi,
> I managed to performe Taylor-Butina clustering on a dataset of 193 571
> fragments retrieved from ZINC20.
> I used the indications in this link
> https://www.macinchem.org/reviews/clustering/clustering.php
> Actually, I've never used RDKit before and never did a cluster analysis,
> so I'm really new to this type of work. I've read the paper related to
> Taylor-Butina clustering (https://pubs.acs.org/doi/10.1021/ci9803381),
> but I don't understand if it can be considered a hierarchical method or not.
> Could someone help me understanding this?
> Moreover, I've got some problems generating the images after clustering.
> First, I don't know what images I need: if it's hierarchical I should do a
> dendrogram, but if it isn't hierchical there's no need (I think).
> I only managed to obtain the image of a sparse similarity matrix, but the
> RAM is too small to obtain a dense matrix.
> I wasn't able to do the plot of the clusters or to obtain the images of
> the moleculese that are centroids or false singletons (I've tried using
> RDKit to obtain images from fingerprints but the images of the molecules
> are strange). I have thousands of clusters and false singletons as results.
> Has someone done something like that in the past? Any suggestions?
> I gave me an explanation of what are false and true singletons (I obtain
> only false singletons, is that normal?), but I appreciate if someone more
> expert could explain me and confirm my guess.
> I'm sorry for all this questions, but I'm really new to this topic.
> Hope someone can help me,
> kind regards.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Javascript MinimalLib

2021-07-21 Thread David Cosgrove

Brilliant, thanks. I will take note of how to do it myself in future .

Best,
Dave


On Wed, 21 Jul 2021 at 12:32, Greg Landrum  wrote:

> Hi Dave,
>
> It's not in the JS interface yet, but I'll add it now.
>
> -greg
>
>
> On Mon, Jul 19, 2021 at 4:57 PM David Cosgrove 
> wrote:
>
>> Hi,
>>
>> In this blogpost
>> https://greglandrum.github.io/rdkit-blog/technical/2021/05/01/rdkit-cffi-part1.html,
>> Greg mentions the CFFI function get_json().  Is that exposed in the JS
>> MinimalLIb, and if so, how would I use it?  I see all sorts of good stuff
>> in cffiwrapper.h, but I can't work out how to call them from JS.
>>
>> Thanks,
>> Dave
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Javascript MinimalLib

2021-07-19 Thread David Cosgrove

Hi,

In this blogpost
https://greglandrum.github.io/rdkit-blog/technical/2021/05/01/rdkit-cffi-part1.html,
Greg mentions the CFFI function get_json().  Is that exposed in the JS
MinimalLIb, and if so, how would I use it?  I see all sorts of good stuff
in cffiwrapper.h, but I can't work out how to call them from JS.

Thanks,
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] MinimalLib and Reactjs

2021-04-19 Thread David Cosgrove

After a bit more experimentation, I have something of an ugly hack that
works for a web page but won't work for a server app.  The issue is mostly
persuading the script to load the WASM file RDKit_minimal.wasm that
RDKit_minimal.js needs.  Greg's demo HTML files show how this can be done
in a web page, so I have left the importing of RDKit_minimal.js there, and
then used it via the window variable in the ReactJS components.  I have
RDKit_minimal.js and RDKit_minimal.wasm in my public folder in the react
app tree.

So, in index.html I have the line:

in between  and .

In my React component I have:

constructor(props) {
super(props);
this.state = {
rdkit: null,
}
}

async componentDidMount() {
let rdkit_mod = await window.initRDKitModule();
this.setState({rdkit: rdkit_mod});
console.log(this.state.rdkit.version());
}

I can then pass the rdkit instance into other components and functions
via the props as normal:

function DrawMol(props) {
if (props.rdkit) {
let mol = props.rdkit.get_mol(props.smiles);
let svg = mol.get_svg();
return ();
} else {
return ({props.smiles});
}
}

I believe that node does have a WASM loader, and if I can work out how
to use that directly, I'll post a further reply.

Hopefully that all makes sense and might be helpful to other people.

Best,

Dave

On Wed, Apr 14, 2021 at 5:10 PM David Cosgrove 
wrote:

> Hi,
>
> I have compiled the latest RDKit (2021_03_1) into .js and .wasm files
> using the Dockerfile in $RDBASE/Code/MinimalLib/docker.  I would like to
> use these in a reactjs app.  This appears to be non-trivial and after
> several days of googling and experimentation I am no further forward.  Has
> anyone had success with this that they can share?
>
> Thanks,
> Dave
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] MinimalLib and Reactjs

2021-04-14 Thread David Cosgrove

Hi,

I have compiled the latest RDKit (2021_03_1) into .js and .wasm files using
the Dockerfile in $RDBASE/Code/MinimalLib/docker.  I would like to use
these in a reactjs app.  This appears to be non-trivial and after several
days of googling and experimentation I am no further forward.  Has anyone
had success with this that they can share?

Thanks,
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] atom index changes after embedding

2021-03-29 Thread David Cosgrove

Hi Pablo,

You could loop over the atoms and use SetProp to add a relevant property to
each one. That should survive the embedding.

Cheers,
Dave


On Mon, 29 Mar 2021 at 11:18, Pablo Ramos  wrote:

> Dear community,
>
>
>
> I want to embed a molecule. For my personal application, I really need the
> atom indices to be respected, even if they are indistinguishable.
>
> The problem is that, because of symmetry, some atoms may be
> indistinguishable. This is something you can see by using
> CanonicalRankAtoms():
>
>
>
> mol = Chem.MolFromSmiles('[H]C([H])=O.[H]N([H])[H]')
>
> print("canonical rank atoms for mol:", list(Chem.CanonicalRankAtoms(mol,
> breakTies=False)))
>
> à canonical rank atoms for mol: [6, 5, 7, 0, 0, 2, 2, 2]
>
>
>
> In the above example we have one group of two identical atoms (labelled as
> 0), and another one of three identical atoms (labelled as 2).
>
>
>
> Therefore, the mol object created after embedding may have swapped
> indistinguishable atoms, losing the track of the initial indexes I had.
>
>
>
> Such problem was discussed in the link bellow. Unfortunately, I cannot
> find any solution from it more than understanding where my problem comes
> from…
>
>
>
> Is there, for the new RDKit version, any way to prevent atom labelling
> changes when embedding / some available solution to this?
>
>
>
> https://github.com/rdkit/rdkit/issues/3219
>
>
>
> Thank you, and have a nice day :)
>
>
>
> Best regards,
>
>
>
> *Pablo Ramos*
> Ph.D. at Covestro Deutschland AG
>
>
>
>
>
> covestro.com <http://www.covestro.com/>
>
> *Telephone*
>
> +49 214 6009 7356
>
>
>
> Covestro Deutschland AG
>
> COVDEAG-Chief Commer-PUR-R
>
> B103, R164
>
> 51365 Leverkusen, Germany
>
> pablo.ra...@covestro.com
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] How to preserve undefined stereochemistry?

2020-10-22 Thread David Cosgrove

bond is unknown.
>>
>> The problem here is that in standard SMILES there is no way to actively
>> specify that you don't know the stereochemistry of a double bond (the same
>> thing applies to stereocenters). You can either provide information about
>> the stereochemistry by using "/" and "\" bonds, or you provide no
>> information. So the SMILES C/C=C/C produces a double bond with known
>> stereochemistry but CC=CC produces a double bond with unspecified
>> stereochemistry.
>>
>> If, based on what you know about the SMILES that you are parsing, you
>> would like to change the convention and have unspecified double bonds be
>> marked as unknown, it's straightforward to write a script that loops over
>> the molecule and makes that change (watch out for ring bonds).
>>
>> -greg
>> [1] Perhaps "mistake" isn't the right word. It's confusing
>>
>> On Tue, Oct 20, 2020 at 1:54 PM Paolo Tosco 
>> wrote:
>>
>>> Hi Adelene,
>>>
>>> this gist
>>>
>>> https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b
>>>
>>> shows how to add stereo annotations to RDKit 2D depictions, and also how
>>> to access the double bond stereochemistry programmatically.
>>>
>>> Cheers,
>>> p.
>>>
>>>
>>> On Tue, Oct 20, 2020 at 12:24 PM Adelene LAI  wrote:
>>>
>>>> Hi RDKit Community,
>>>>
>>>>
>>>> Is there a way to preserve undefined stereochemistry aka unspecified
>>>> stereochemistry when doing MolFromSmiles?
>>>>
>>>> I'm working with a bunch of molecules, some with stereochemistry
>>>> defined, some without.
>>>>
>>>>
>>>> If stereochemistry is undefined in the SMILES, I would like it to stay
>>>> that way when converted to a Mol, but this doesn't seem to be the case:
>>>>
>>>>
>>>> > mol =
>>>> Chem.MolFromSmiles('CC(C)(C1=CC(=C(C(=C1)Br)O)Br)C(=CC(C(=O)O)Br)CC(=O)O')
>>>> > mol
>>>>
>>>> One would expect that C=C to either be crossed, as in PubChem's
>>>> depiction:
>>>>
>>>> https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure
>>>>
>>>>
>>>> <https://pubchem.ncbi.nlm.nih.gov/compound/139598257#section=2D-Structure>
>>>>
>>>>
>>>> or that single bond to be squiggly, as in CDK's depiction:
>>>>
>>>> But it's not just a matter of depiction, as it seems internally, mol is
>>>> equivalent to its stereochem-specific sibling (Entgegen form)
>>>>
>>>>
>>>> CC(C)(C1=CC(=C(C(=C1)Br)O)Br)/C(=C/C(C(=O)O)Br)/CC(=O)O
>>>>
>>>>
>>>>
>>>> I've tried sanitize=False, but it doesn't seem to have any effect. I
>>>> would prefer not having to manually SetStereo(Chem.BondStereo.STEREOANY)
>>>> for every molecule with undefined stereochem (not sure how I would even go
>>>> about that...).
>>>>
>>>>
>>>> Possibly related to:
>>>>
>>>>
>>>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570
>>>>
>>>>
>>>>
>>>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
>>>>
>>>> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAHOi4k3revAu-9qhFt0MpUpr0aADQ9d8bV2XT6FurTEKimCQng%40mail.gmail.com/#msg36365128
>>>> o = Chem.MolFromSmiles('C/C=C/C')
>>>>
>>>>
>>>> <https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/C00BE94F-6F6F-466A-83D4-3045C9006026%40gmail.com/#msg34929570>
>>>> https://www.rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html
>>>>
>>>> https://github.com/openforcefield/openforcefield/issues/146
>>>>
>>>>
>>>>
>>>>
>>>> Any help would be much appreciated.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Adelene
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Doctoral Researcher
>>>>
>>>> Environmental Cheminformatics
>>>>
>>>> UNIVERSITÉ DU LUXEMBOURG
>>>>
>>>>
>>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>>>
>>>> 6, avenue du Swing
>>>> <https://www.google.com/maps/search/6,+avenue+du+Swing?entry=gmail=g>,
>>>> L-4367 Belvaux
>>>>
>>>> T +356 46 66 44 67 18
>>>>
>>>> [image: github.png] adelenelai
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Drawing mol to a coordinate box (x, y, width, height)?

2020-09-22 Thread David Cosgrove

Hi Imran,

I don’t think that’s possible at the moment. In principle you could derive
a class from the base MolDraw2D class and add the matplotlib drawing
commands for drawing lines etc. It wouldn’t be a trivial piece of work,
however.

Regards,
Dave


On Tue, 22 Sep 2020 at 16:27, Imran Shah  wrote:

>
> Hi Folks,
>
> Does anyone know of an rdkit draw function to render a chemical on a
> matplotlib axis in a box (x,y, width, height)? I'm interested in creating a
> vector image (i.e. publication quality) that has multiple chemicals (shown
> below). I created this raster image using matplotlib and  MolToImage with
> imshow:-
>
> [image: image]
> <https://user-images.githubusercontent.com/10340891/93813456-24382100-fc21-11ea-82a9-fefd04c14078.png>
>
> I have tried out the option of writing the mol to svg, reading the svg and
> then rendering that using matplot but the result isn't ideal. Thanks in
> advance.
>
> Cheers,
>
> Imran
>
> --
> Imran Shah
> imran.a.s...@gmail.com
>
>
>
> ___
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread David Cosgrove

Hi Tim,
I don’t have any code, but if you go to
https://github.com/harryjubb/arpeggio and look in config.py there are
SMARTS definitions for various interaction types with geometric tests that
might help. If you already have a suitable complex, you could just use
arpeggio.py to pull out the interactions directly. It doesn’t use RDKit,
but an unholy alliance of OpenBabel and Biopython.

HTH,
Dave

On Mon, 7 Sep 2020 at 14:07, Tim Dudgeon  wrote:

> Hi RDKitters,
> I was wondering whether anyone has any RDKit code that checks on the
> geometry of a H-bond.
> e.g. once a donor and acceptor are located within a reasonable distance of
> each other to check on the angles involved to establish if that is a
> reasonable H-bond.
> Tim
>
>
>
>
> ___
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread David Cosgrove

Hi Jason,
The answer is that when you delete the molecule, the memory it uses is
flagged as available for re-use,  but nothing else happens to it. If you
then de-reference pointers to it, such as the atoms that are buried in the
block of memory allocated to the molecule, you may get away with it and you
may not. It will depend on whether something else has written over the
memory or not. In your example, the memory was still in its original state,
so the de-referencing of the atom pointers succeeded. This is not
guaranteed, however, and this sort of bug is generally very nasty to find-
sometimes the code will run, sometimes it will crash. Worse still is if you
accidentally write to de-allocated memory that something else is now using-
you can then get failures 5 minutes later in a completely different part of
the program.

Deleting the atoms is also an error, because they will be deleted by the
molecule’s destructor, so you’ll be de-allocating the memory twice, another
exciting source of undefined behaviour. Valgrind is excellent for tracking
down these sorts of error, and many more besides.  If you’re developing on
Linux, it’s good practice to use it on any code before you use that program
in earnest.

Cheers,
Dave

On Thu, 27 Aug 2020 at 20:17, Jason Biggs  wrote:

> Everything I know about C++ I learned just so that I can write a link
> between an interpreted language and the rdkit, so there are definitely some
> gaps in my knowledge.
>
> What I'm trying to understand right now is the expected lifetime of an
> Atom pointer returned by a molecule, for instance by the getAtomWithIdx
> method.  Based on the documentation, since this method doesn't say the user
> is responsible for deleting the returned pointer I know I'm not supposed to
> delete it. But when exactly does it get deleted?  If I dereference it after
> deleting the molecule, what is it?
>
> auto mol = RDKit::SmilesToMol("");
> auto atom = mol->getAtomWithIdx(0);
> auto m2 = atom->getOwningMol();
> std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
> delete mol;
> std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
> std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
> delete atom; // seg fault
>
> I would have thought the first time dereferencing the atom pointer after
> deleting mol would have crashed, but it does not.  I would also have
> expected bad things when calling the getNumAtoms method on m2 after calling
> delete on mol, but this also works just fine.  What am I missing?
>
> Thanks
> Jason
>
>
> ___
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] RDKit installation problem

2020-08-01 Thread David Cosgrove

That happened to me, too, recently. I think I solved it by specifying the
python version explicitly (-python==3.7 IIRC).

Hope that helps,
Dave


On Sat, 1 Aug 2020 at 19:30, Sebastián J. Castro  wrote:

> I have try the installation suggested at
> http://www.rdkit.org/docs/Install.html:
>
> $ conda create -c rdkit -n my-rdkit-env rdkit
>
> But I get 2017 version instead of 2020 (last released).
>
> I don't know how to install it. Can you help me?
>
> I have Ubuntu 20.04 LTS
>
> Thank you
>
> Best regards!
>
>
> --
> Dr. Sebastián J. Castro
> Departamento de Ciencias Farmacéuticas
> Facultad de Ciencias Químicas
> Universidad Nacional de Córdoba
> UNITEFA-CONICET
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] draw molecule without rescaling or translating

2020-07-26 Thread David Cosgrove

Hi Jason,
The original design was set up to make it relatively straightforward to
make your own drawing class, so it's disappointing you aren't finding it
so.  I guess we've become focussed on getting Cairo and SVG pictures to
work, and have forgotten to keep this in mind.
A good place to start might be overriding the virtual functions
getDrawCoords() and getAtomCoords().  These transform from atom coordinates
to draw coordinates and vice versa.  You will need to override both, and
make sure they are consistent.  For reasons that demonstrate that the whole
drawing system needs re-factoring, the code for setting the scales switches
between the two coordinate systems in ways that are impossible to defend.
It's a bit like the human eye - it's where we've ended up, but not how
you'd design it if you were starting from scratch.  The other thing might
be to override the non-virtual calculateScale() functions.  There is a
class data member needs_scale_ which would probably be exactly what you
want except that it's private so you won't be able to see or alter it in
your derived class.  That's something we could think about making access
functions for.
I hope this helps.  If not, please do get back to me either privately or
via the list.  It would be useful for future development to find out how
you get on and what would have made it easier for you.

Best,
Dave

On Sun, Jul 26, 2020 at 8:37 PM Jason Biggs  wrote:

> I'm trying to use the MolDraw2D class in C++ to generate all the graphics
> primitives for a molecule, which I then pass into my own graphics engine to
> make an image.  I'm doing this by making a new class that is a subclass of
> MolDraw2D, similar to MolDraw2DSVG, and overriding the drawXXX methods.
>
> But I find that when I call drawMolecule on a molecule that already has 2D
> coordinates, the lines that get drawn have all been rescaled and
> translated.  I need to turn off this, I need for the coordinates used to
> make the lines match the existing conformer.  I know this is somehow
> related to the width and height used in the MolDraw2D constructor, and I
> see that there are options fixedBondLength, fixedScale, as well as a
> setScale method.  But I'm not clear on what values to set in these to set
> the rescaling to 1 and translation to 0.
>
> Thanks
> Jason Biggs
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Define identical atoms in SMARTS pattern

2020-07-24 Thread David Cosgrove

Hi Jan,
I think you will have to add some extra logic to your program to check the
hit atoms are the same. I don’t think you can do it directly in SMARTS.
Best regards,
Dave

On Fri, 24 Jul 2020 at 15:08, Jan Halborg Jensen 
wrote:

> Is there a way to find a [C]([#X])[#X] pattern, where X=X, that finds
> C(C)C, C(O)O, C(F)F, etc., but not C(C)O, etc.?
>
> Best regards, Jan
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Multiline legend in MolsToGridImage

2020-04-09 Thread David Cosgrove

Hi Gustavo,
If you do create the issue, could you also add that the legend should scale
the font sensibly and also annex its own part of the drawing area. At the
moment it just goes at the bottom and the molecule may obscure it. It would
make sense if these were all done at the same time.
Cheers,
Dave


On Thu, 9 Apr 2020 at 06:00, Greg Landrum  wrote:

> Hi Gustavo,
>
> I don't believe that this is currently possible. I'm pretty sure we would
> need to add explicit code to handle this case.
> This would be a good thing to create a github issue for, if you're
> interested.
>
> -greg
>
>
> On Wed, Apr 8, 2020 at 10:35 PM Gustavo Seabra 
> wrote:
>
>> [image: Screenshot from 2020-04-08 16-28-37.png]
>>
>> Hi,
>>
>> Does anyone know how to write multiline legends when using
>> MolsToGridImage? I've been trying the code [here](
>> https://sourceforge.net/p/rdkit/mailman/message/35561198/), but nothing
>> there seems to work for me, as I only get a blank rectangle in place of the
>> \n or \r symbols... (see picture)
>>
>> Are there any ideas?
>> Thanks,
>> --
>> Gustavo Seabra.
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> _______
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] RDKit in C++

2020-03-27 Thread David Cosgrove

Hi Leon,
Sorry for the slow reply.  Greg has just merged in an updated C+ document
which will be available in the new release.  If you've used the old methods
for iterating over atoms, you might want to take a look at that bit at
least.  It is now vastly simpler.
It seems the RMS routines in C++ are still waiting for someone to be
stirred enough to do them.  I'm not sure they will make your life much
faster in any case, as it is an inherently expensive process and the basic
comparison of 2 conformers is compiled C++ code wrapped up for Python.
What the Python wrapper has that isn't in C++ is just a higher-level
function that does all conformers vs all conformers for a molecule.  The
overhead for those extra loops in Python is unlikely to be huge, I wouldn't
think.
Best,
Dave


On Wed, Feb 26, 2020 at 5:41 PM topgunhaides .  wrote:

> Hey Paolo and David,
>
> Thanks a lot!
> This is probably the most helpful resource I can use. It is great that you
> are planning to add new stuff in there and update things.
>
> One reason for me to transform my python code to c++ is to improve
> efficiency.
> (need to do a series of RDKit works like embedding confromers, RMS between
> confs, Shape Tanimoto distances, etc., with a lot of my own programming
> logic)
> In addition, profiling my python code showed the RMS (bestrms) step is the
> bottleneck, is the C++ version of RMS code coming soon?
>
> I will keep tracking the changes you make in the near future. Really
> appreciate it!
>
> Best,
> Leon
>
>
>
>
> On Wed, Feb 26, 2020 at 11:17 AM David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
>> Hi Leon,
>> There is indeed such a thing.  It's not as complete as the Python one, as
>> it was rather more work than I anticipated.  Also, I haven't been keeping
>> the examples uptodate, especially the newer ways of iterating over atoms
>> and bonds, and the CMakeLists.txt. It should give you some useful pointers,
>> however. You can find it here:
>> https://github.com/rdkit/rdkit/blob/master/Docs/Book/GettingStartedInC%2B%2B.md,
>> which should be in $RDBASE/Docs/Book if you have cloned the repo.  The
>> examples are in C++Examples in that directory also.
>> I will try and find time over the next few weeks to make the examples
>> current.  Also, underneath $RDBASE/Code there are lots of files called
>> test*cpp which are the unit tests for the various parts, and they have
>> useful stuff in them as well.
>> Cheers,
>> Dave
>>
>>
>> On Wed, Feb 26, 2020 at 3:53 PM topgunhaides . 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I noticed that someone asked such question some years ago.
>>> Since it is now 2020, do we now have anything like "Getting Started with
>>> the RDKit in C++"?
>>>
>>> I am planning to transfer my RDKit python code to C++.
>>> Can anyone give me some resources? I found some, but just in case that I
>>> missed important ones. Any suggestions are very welcome. Thanks!
>>>
>>> Best,
>>> Leon
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Anaconda and RDkit

2020-03-05 Thread David Cosgrove

Of no help to Francesco whatsoever, but to save anyone else the bother...
I’ve just taken the “from __future__...” bit out of the getting started
docs in the PR I have open at the moment. It doesn’t seem worth its own PR
but it needs to be acknowledged that we are now in the future .

Dave


On Thu, 5 Mar 2020 at 16:25, Francesco Coppola <
coppolafrancesco1...@gmail.com> wrote:

> Hi,
>
> I'm Francesco Coppola a recent graduate student in Medicinal Chemistry in
> Italy. Now I'm in Manchester and I'm doing a traineeship in computational
> chemistry. During my thesis work, I deal with Docking programs (and they
> always had an interface), now instead I'm trying to work with Python,
> Terminal and interpreter.
>
> I have searched on a thousand sites and blogs but I just can't install
> RDKit on a Windows pc.
> In particular, I have a problem when I would activate the environment.
>
> I have the latest version of RDKit (the folder is on my desktop and *just*
> copied/pasted in c: ).
>
> First I followed this command:
>
> conda create -c rdkit -n my-rdkit-env rdkit
>
> Done this installation, I type:
>
> conda activate my-rdkit-env
>
> Now in Ananaconda Prompt, I see:
>
> (my-rdkit-env) c:\Users\HP>
>
> (and in fact, I can find this file "my-rdkit-env in the folder env of
> Anaconda3)
>
> But now, I'm not in Python. So I type
> py
>
> and Python 3.8.2 starts
>
> >>>
>
> Now I follow this guide:
> https://www.rdkit.org/docs/GettingStartedInPython.html, and I Type the
> first two command line:
>
> >>> from __future__ import print_function>>> from rdkit import Chem
>
> But it don't works.
>
> Traceback (most recent call last):
>File ", line 1, in 
> ModuleNotFoundError: No module named 'rdkit'
> >>>
>
> I need a guide that tells me what to do step by step.
>
> Sorry if I disturbed you, I hope you can help me..I'm completely
> inexperienced but I am trying to learn.
>
> Sorry for the trouble.
>
> Best regards
> Francesco
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] RDKit in C++

2020-02-26 Thread David Cosgrove

Hi Leon,
There is indeed such a thing.  It's not as complete as the Python one, as
it was rather more work than I anticipated.  Also, I haven't been keeping
the examples uptodate, especially the newer ways of iterating over atoms
and bonds, and the CMakeLists.txt. It should give you some useful pointers,
however. You can find it here:
https://github.com/rdkit/rdkit/blob/master/Docs/Book/GettingStartedInC%2B%2B.md,
which should be in $RDBASE/Docs/Book if you have cloned the repo.  The
examples are in C++Examples in that directory also.
I will try and find time over the next few weeks to make the examples
current.  Also, underneath $RDBASE/Code there are lots of files called
test*cpp which are the unit tests for the various parts, and they have
useful stuff in them as well.
Cheers,
Dave

On Wed, Feb 26, 2020 at 3:53 PM topgunhaides .  wrote:

> Hi guys,
>
> I noticed that someone asked such question some years ago.
> Since it is now 2020, do we now have anything like "Getting Started with
> the RDKit in C++"?
>
> I am planning to transfer my RDKit python code to C++.
> Can anyone give me some resources? I found some, but just in case that I
> missed important ones. Any suggestions are very welcome. Thanks!
>
> Best,
> Leon
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] RDKit Cartridge mol_to_svg parameters

2020-02-13 Thread David Cosgrove

Yes, I'm working on putting that in at the same time as everything else.
Dave


On Thu, Feb 13, 2020 at 3:54 PM Greg Landrum  wrote:

>
>
> On Thu, Feb 13, 2020 at 10:57 AM David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
>>
>> However, thanks to the generosity of MedChemica, I am currently working
>> on improvements to the drawing code, and have added some extra options to
>> drawOptions() that will need interpreting in updateDrawerParamsFromJSON.
>> Whilst doing this, I'll make sure it picks up all the other possibilities
>> in the drawOptions() function as well.  In fact, your post was very timely
>> as this function was one I hadn't seen before, so now I know I need to look
>> at it!  If all goes ideally, this may be in the 2020.03 release, if not it
>> should be in 2020.09 and of course will be available in the development
>> version as soon as my pull request is accepted, should it be so.
>>
>
> Just to confirm: I think it would be really useful to be able to control
> the line width for the bonds via drawOptions() instead of having to do this
> via the MolDraw2D object itself. It'd be great if you could get to that as
> part of the MedChemica work, but if not I will add it after your PR goes in.
>
> Best,
> -greg
>
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] RDKit Cartridge mol_to_svg parameters

2020-02-13 Thread David Cosgrove

Hi Thomas,

I've never used the RDKit cartridge, so have not come across mol_to_svg()
before.  In the more general case, you can change the bond line width with
something like:
drawer = rdMolDraw2D.MolDraw2DSVG(200, 200)
drawer.drawOptions().bondLineWidth = 10
It may be that you don't have this level of control in the cartridge in
which case I have no workaround in the short term.

However, thanks to the generosity of MedChemica, I am currently working on
improvements to the drawing code, and have added some extra options to
drawOptions() that will need interpreting in updateDrawerParamsFromJSON.
Whilst doing this, I'll make sure it picks up all the other possibilities
in the drawOptions() function as well.  In fact, your post was very timely
as this function was one I hadn't seen before, so now I know I need to look
at it!  If all goes ideally, this may be in the 2020.03 release, if not it
should be in 2020.09 and of course will be available in the development
version as soon as my pull request is accepted, should it be so.

Cheers,
Dave


On Thu, Feb 13, 2020 at 8:47 AM Thomas Strunz  wrote:

> OK, I dug into the source code
> <https://github.com/rdkit/rdkit/blob/d41752d558bf7200ab67b98cdd9e37f1bdd378de/Code/GraphMol/MolDraw2D/MolDraw2DUtils.cpp>
> and it seems bondLineWidth simply can't be set:
>
>
> void updateDrawerParamsFromJSON(MolDraw2D , const std::string
> ) {
>   if (json == "") {
> return;
>   }
>   std::istringstream ss;
>   ss.str(json);
>   MolDrawOptions  = drawer.drawOptions();
>   boost::property_tree::ptree pt;
>   boost::property_tree::read_json(ss, pt);
>   PT_OPT_GET(atomLabelDeuteriumTritium);
>   PT_OPT_GET(dummiesAreAttachments);
>   PT_OPT_GET(circleAtoms);
>   PT_OPT_GET(continuousHighlight);
>   PT_OPT_GET(flagCloseContactsDist);
>   PT_OPT_GET(includeAtomTags);
>   PT_OPT_GET(clearBackground);
>   PT_OPT_GET(legendFontSize);
>   PT_OPT_GET(multipleBondOffset);
>   PT_OPT_GET(padding);
>   PT_OPT_GET(additionalAtomLabelPadding);
>   get_colour_option(, "highlightColour", opts.highlightColour);
>   get_colour_option(, "backgroundColour", opts.backgroundColour);
>   get_colour_option(, "legendColour", opts.legendColour);
>   if (pt.find("atomLabels") != pt.not_found()) {
> BOOST_FOREACH (boost::property_tree::ptree::value_type const ,
>pt.get_child("atomLabels")) {
>   opts.atomLabels[boost::lexical_cast(item.first)] =
>   item.second.get_value();
> }
>   }
> }
>
> legendFontSize indeed works. I was using a setting I had from python (0.5)
> which got silently ignored. But with a proper value in points size I assume
> it works.
>
> So I suggest to add the bondLineWidth as option to above method.
>
> Best Regards,
>
> Thomas
>
> --
> *Von:* Thomas Strunz 
> *Gesendet:* Donnerstag, 13. Februar 2020 09:09
> *An:* rdkit-discuss@lists.sourceforge.net <
> rdkit-discuss@lists.sourceforge.net>
> *Betreff:* [Rdkit-discuss] RDKit Cartridge mol_to_svg parameters
>
> Hi All,
>
> started to play around with the rdkit cartridge and I was wondering how to
> correctly use the mol_to_svg function.
>
> On rdkit homepage I only found this:
>
>
> *mol_to_svg(mol,string default ‘’,int default 250, int default 200, string
> default ‘’) : returns an SVG with a drawing *
>
> *of the molecule. The optional parameters are a string to use as the
> legend, the width of the image, the height of the image, *
> *and a JSON with additional rendering parameters. (available from the
> 2016_09 release)*
>
> The interesting part are the rendering parameters. Is there a list of them
> and some examples of this function?
>
> I tried stuff like below:
>
> mol_to_svg(mol, 'Test', 150, 100, '{"bondLineWidth": 1, "legendFontSize":
> 0.5}')
>
> There is no error but the "JSON" options are not applied. The image always
> looks the same with for my taste too thick bonds.
>
> Best Regards,
>
> Thomas
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Building on MacBook

2020-02-03 Thread David Cosgrove

Hi,
I have just built the current snapshot of RDKit from the git repo on a
MacBook Pro with the latest macOS (Catalina 10.15.3).  I used the conda
build recipe from
https://github.com/rdkit/rdkit/blob/master/Docs/Book/Install.md#how-to-build-from-source-with-conda
:

conda create -n rdkit-dev-build python=3.6 -c rdkit
conda activate rdkit-dev-build
conda install numpy matplotlib cmake cairo pillow eigen pkg-config
boost-cpp boost py-boost
pip install yapf==0.11.1 coverage==3.7.1
export PYROOT=/Users/david/anaconda3/envs/rdkit-dev-build
cmake -DPYTHON_INCLUDE_DIR=$PYROOT/include/python3.6m  \
  -DRDK_BUILD_AVALON_SUPPORT=ON \
  -DRDK_BUILD_CAIRO_SUPPORT=ON \
  -DRDK_BUILD_INCHI_SUPPORT=ON \
  ..

make
make install

If run python in the build directory, I can import the python modules, but
not from another directory.  How do I add the rdkit libraries to my conda
environment so that they can be seen from anywhere, just like the
production env I installed directly from conda?

I have tried pointing PYTHONPATH and sys.path (separately) at the build
directory, and build/lib, but to no avail.  In both cases I get the error:

Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/david/Projects/RDKit/rdkit/rdkit/__init__.py", line 2, in

from .rdBase import rdkitVersion as __version__
ImportError: dlopen(/Users/david/Projects/RDKit/rdkit/rdkit/rdBase.so, 2):
Library not loaded: @rpath/libRDKitRDBoost.1.dylib
  Referenced from: /Users/david/Projects/RDKit/rdkit/rdkit/rdBase.so
  Reason: image not found

which looks like it is almost working.

Thanks,
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread David Cosgrove

Hi Rocco,
Point taken. I don’t think you’d be able to get RDKit to spit such SMILES
strings out unless you tortured it pretty hard, however.
Dave


On Mon, 18 Nov 2019 at 16:36, Rocco Moretti  wrote:

> Actually, it is possible to get arbitrary orders, if you (ab)use the '.'
> component ("zero order bond") directive and the numeric bonding ("ring
> closure") directives:
>
> >>> Chem.MolToSmiles( Chem.MolFromSmiles("O1.Cl2.C12" ) )
> 'OCCl'
>
> Whether you want to do things that way is another question.
>
> On Mon, Nov 18, 2019 at 10:24 AM David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
>> Hi Rafal,
>> It is not always possible to preserve the atom ordering in the SMILES
>> string because there is an implied bond between contiguous symbols in the
>> SMILES. I think, for example, that the molecule with the SMILES OCCl
>> couldn’t have the order in the molecule object O first, Cl second, C third,
>> with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order.
>>
>> I hope that made sense. Please ask again if not.
>>
>> Best regards,
>> Dave
>>
>>
>> On Mon, 18 Nov 2019 at 12:33, Rafal Roszak  wrote:
>>
>>> Hi all,
>>>
>>> Is there any way to preserve atom order from Mol object during
>>> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and
>>> canonical=False options but it not always prevent oryginal order.
>>> I know I can use _smilesAtomOutputOrder to map old indices to new one
>>> in canonical smiles but maybe we have something more handy?
>>>
>>> Best,
>>>
>>> Rafał
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread David Cosgrove

Hi Rafal,
It is not always possible to preserve the atom ordering in the SMILES
string because there is an implied bond between contiguous symbols in the
SMILES. I think, for example, that the molecule with the SMILES OCCl
couldn’t have the order in the molecule object O first, Cl second, C third,
with bonds between 1 and 3 and 2 and 3 and get the SMILES in that order.

I hope that made sense. Please ask again if not.

Best regards,
Dave

On Mon, 18 Nov 2019 at 12:33, Rafal Roszak  wrote:

> Hi all,
>
> Is there any way to preserve atom order from Mol object during
> exporting to smiles? I tried MolToSmiles with rootedAtAtom=0 and
> canonical=False options but it not always prevent oryginal order.
> I know I can use _smilesAtomOutputOrder to map old indices to new one
> in canonical smiles but maybe we have something more handy?
>
> Best,
>
> Rafał
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] distinguishing macrocyclic molecules

2019-10-09 Thread David Cosgrove

Hi Ivan,
There is an RDKit extension to SMARTS that allows something like [r12-20].
I can’t check the exact syntax at the moment. You might want to check that
atoms are not in smaller rings as well, so as not to pull up things like
anthracene which might not be something you’d want to class as a macrocycle.
Cheers,
Dave

On Wed, 9 Oct 2019 at 14:39, Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi Thomas,
>
> I don't know of an RDKit function that directly recognizes macrocycles,
> but you could find the size of the largest ring this way:
>
> ri = mol.GetRingInfo()
> largest_ring_size = max((len(r) for r in ri.AtomRings()), default=0)
> if largest_ring_size > 12:
> ...
>
> You can also find if a molecule has a ring of a certain size using SMARTS,
> but only for rings up to size 20 at the moment (this is an RDKit-specific
> limit). For example, if you are happy with finding rings of size 12-20, you
> could use SMARTS [r12,r13,r14,r15,r16,r17,r18,r19,r20]. It's ugly but can
> be handy if you already have SMARTS-based tools to reuse.
>
> Ivan
>
> On Wed, Oct 9, 2019 at 7:25 AM Thomas Evangelidis 
> wrote:
>
>> Greetings,
>>
>> Is there an automatic way to distinguish the macrocyclic molecules within
>> a large chemical library using RDKit? For example, according to this
>> definition: Macrocycles are ring structures composed of at least twelve
>> atoms in the central cyclic framework [1,2,3]. Maybe someone here has a
>> better definition. Could anyone give me some hints on how to program this?
>>
>> I thank you in advance.
>> Thomas
>>
>> 1. Yudin AK (2015) Macrocycles: lessons from the distant past, recent
>> developments, and future directions. Chem Sci 6:30–49.
>> 2. Marsault E, Peterson ML (2011) Macrocycles are great cycles:
>> applications, opportunities, and challenges of synthetic macrocycles in
>> drug discovery. J Med Chem 54:1961–2004.
>> 3. Heinis C (2014) Drug discovery: tools and rules for macrocycles. Nat
>> Chem Biol 10:696–698.
>>
>>
>> --
>>
>> ==
>>
>> Dr. Thomas Evangelidis
>>
>> Research Scientist
>>
>> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
>> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>
>> , Prague, Czech Republic
>>   &
>> CEITEC - Central European Institute of Technology
>> <https://www.ceitec.eu/>, Brno, Czech Republic
>>
>> email: teva...@gmail.com, Twitter: tevangelidis
>> <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
>> <https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>>
>> _______
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Catching errors in SMILES files

2019-06-04 Thread David Cosgrove

Hi Paolo,
Many thanks for the speedy reply.  I'll do as you suggest for now.  Do you
want me to file an issue on github, or even, maybe, see if I can fix it
myself?
Cheers,
Dave


On Mon, Jun 3, 2019 at 5:32 PM Paolo Tosco 
wrote:

> Hi David,
>
> a workaround could be adding a final check after the for loop:
>
> #!/usr/bin/env python
>
> from rdkit import Chem
>
> suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False, nameColumn=1)
> rec_num = 0
> print("len(suppl1) = {0:d}".format(len(suppl1)))
> for mol in suppl1:
> rec_num += 1
> if not mol:
> print('Record {} not read.'.format(rec_num))
> else:
> print('Record {} read ok.'.format(rec_num))
> if (rec_num == len(suppl1) - 1):
> rec_num += 1
> print('Record {} not read.'.format(rec_num))
>
>
> suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False, nameColumn=1)
> rec_num = 0
> print("len(suppl2) = {0:d}".format(len(suppl2)))
> for mol in suppl2:
> rec_num += 1
> if not mol:
> print('Record {} not read.'.format(rec_num))
> else:
> print('Record {} read ok.'.format(rec_num))
> if (rec_num == len(suppl2) - 1):
> rec_num += 1
> print('Record {} not read.'.format(rec_num))
>
> This should work until what seems to be an issue in the SmilesSupplier is
> fixed.
>
> Cheers,
> p.
>
> On 06/03/19 16:49, David Cosgrove wrote:
>
> Hi,
>
> I'm trying to catch the line numbers of lines in a SMILES file that aren't
> parsed by the SmilesMolSupplier.  Example code is attached, along with 2
> SMILES files.  When there is a bad SMILES string on the last line, the
> error is not reported, as in test2.smi.  I've tried iterating through the
> file in a loop using next(suppl1) and catching the StopIteration exception,
> but I have the same issue.  Is there a way to spot a last bad record in a
> file?
>
> Thanks,
> Dave
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Catching errors in SMILES files

2019-06-03 Thread David Cosgrove

Hi,

I'm trying to catch the line numbers of lines in a SMILES file that aren't
parsed by the SmilesMolSupplier.  Example code is attached, along with 2
SMILES files.  When there is a bad SMILES string on the last line, the
error is not reported, as in test2.smi.  I've tried iterating through the
file in a loop using next(suppl1) and catching the StopIteration exception,
but I have the same issue.  Is there a way to spot a last bad record in a
file?

Thanks,
Dave

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk


test1.smi
Description: application/diskcopy


test2.smi
Description: application/diskcopy
#!/usr/bin/env python

from rdkit import Chem

suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False, nameColumn=1)
rec_num = 0
for mol in suppl1:
rec_num += 1
if not mol:
print('Record {} not read.'.format(rec_num))
else:
print('Record {} read ok.'.format(rec_num))


suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False, nameColumn=1)
rec_num = 0
for mol in suppl2:
rec_num += 1
if not mol:
print('Record {} not read.'.format(rec_num))
else:
print('Record {} read ok.'.format(rec_num))


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Read only first model of a pdb-file

2019-05-29 Thread David Cosgrove

Biopython is excellent for extracting particular models from a PDB file. As
Dimitri suggests, you can then pass the result into your processing script.
It is quite straightforward to write the relevant PDB model to a string in
PDB format and parse with RDKit’s PDB reader, for example.

Dave


On Wed, 29 May 2019 at 21:13, Dimitri Maziuk via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> On 5/29/19 8:19 AM, Illimar Hugo Rekand wrote:
> > Hey, RDKitters!
> >
> >
> > I am currently trying to figure out how to only read in the first model
> of a pdb-file. I've designed a script that performs calculations on a
> per-atom basis, and this is very slow when it tries to account for multiple
> models, for example with a NMR-structure.
>
> Pre-process the PDB file to cut out the model you want. In the files
> annotated by PDB it should be the first model and I belive tehre is a
> REMARK something-or-other "best model in this ensemble".
>
> However this fails for multiple conformers in one file, there is at
> least one in PDB.
>
> (It's been a while since I did this so I don't remember the remark
> number, nor the multi-conormer entry id off the top of my head.)
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Bug with Calculation of aromatic rings?

2019-03-06 Thread David Cosgrove

ss
>
> --
> *Bournez Colin * <https://www.linkedin.com/in/colin-bournez-b9a1b2b7/> 
> <http://www.icoa.fr/> *Chemoinformatics PhD Student *
> * Institute of Organic and Analytical Chemistry (ICOA UMR7311)*
>  Université d'Orléans - Pôle de Chimie  Rue de Chartres - BP 6759  45067
> Orléans Cedex 2 - France  +33 (0)2 38 49 45 77
> <+33%202%2038%2049%2045%2077>  SBC Tool Platform <http://sbc.icoa.fr/> - SBC
> Team <http://www.icoa.fr/bonnet>  <http://www.icoa.fr/fr/rss.xml>
>
> <https://www.facebook.com/pages/Institut-de-Chimie-Organique-et-Analytique-ICOA-umr7311/222060911297163>
>  <https://twitter.com/ICOA_UMR7311>
>
> <https://www.linkedin.com/company/institut-de-chimie-organique-et-analytique---icoa-umr7311/>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread David Cosgrove

Slightly off topic, but a minor issue with the Taylor-Butina algorithm is
that it generates “false singletons”. These are molecules just outside the
clustering cutoff that are stranded when their neighbours are put in a
different, larger cluster. We used to find it convenient to have a sweep of
these, at a slightly looser cutoff, and drop them into the cluster whose
centroid/seed they were nearest too. This could be added to Andrew’s code
quite easily. At the very least, it’s worth keeping track of the initial
number of neighbours within the cluster cutoff that each fingerprint had so
as to distinguish real singletons from these artefactual ones.
Dave


On Tue, 25 Sep 2018 at 19:56, Peter S. Shenkin  wrote:

> Well, I'm not really familiar with the Taylor-Butina clustering method, so
> I'm proposing a methodology based on generalizing something that I found to
> be useful in a somewhat different clustering context.
>
> Presuming that what you are clustering is the fingerprints of structures,
> and that you know which structures are in each cluster, you'd compute the
> average of all the fingerprints. That is, each bit position would be given
> a floating point number that is the average of the 0s and 1s at that
> position computed over the structures in the cluster.  Then you'd compute
> the distance (say, Manhattan or Euclidian) between the fingerprint of each
> structure in the cluster and the average so computed. The "most
> representative structure" would be the cluster member whose distance is
> closest to the cluster's average fingerprint. (Some additional mileage
> could be gained by seeing just how far away from the averag the "most
> representative structures" are. It might be more representative (i.e.,
> closer) for some clusters than for others.
>
> It would make sense to try this (since it's easy enough) and see whether
> the resulting "most representative structures" from the clusters really are
> at least roughly representative, by comparing them with viewable random
> subsets of structures from the clusters.
>
> -P.
>
> On Tue, Sep 25, 2018 at 2:36 PM, Andrew Dalke 
> wrote:
>
>> On Sep 25, 2018, at 17:13, Peter S. Shenkin  wrote:
>> > FWIW, in work on conformational clustering, I used the “most
>> representative” molecule; that is, the real molecule closest to the
>> mathematical centroid. This would probably be the best way of displaying a
>> single molecule that typifies what is in the cluster.
>>
>> In some sense I'm rephrasing Chris Earnshaw's earlier question - how does
>> one do that with Taylor-Butina clustering? And does it make sense?
>>
>> The algorithm starts by picking a centroid based on the fingerprints with
>> the highest number of neighbors, so none of the other cluster members
>> should have more neighbors within that cutoff.
>>
>> I am far from an expert on this topic, but with any alternative I can
>> think of makes me think I should have started with something other than
>> Taylor-Butina.
>>
>>
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-21 Thread David Cosgrove

I used to have a paper that demonstrated that the tanimoto coefficient
does, in fact, obey the triangle inequality. I fear I lost access to it
when I retired but maybe a determined google expert could rediscover it.
I expect James means what we used to call the cluster seed, i.e. the
molecule the cluster was based on, rather than the mathematical centroid.
Calculating distances from each cluster member to that would be quite
straightforward as a post-processing step although that would roughly
double the time taken.
Regards ,
Dave


On Fri, 21 Sep 2018 at 09:55, Chris Earnshaw  wrote:

> Hi
>
> I'm afraid I can't help with an RDkit solution to your question, but there
> are a couple of issues which should be born in mind:
> 1) The centroid of a cluster is a vector mean of the fingerprints of all
> the members of the cluster and probably will not be represented *exactly*
> by any member of the cluster; in this case no structures will have a
> distance of 0.0 from the centroid. Do you want to calculate the distances
> from the true centroid or from the structure(s) closest to the centroid?
> 2) The Tanimoto metric doesn't obey the triangle inequality and is
> therefore sub-optimal for this kind of analysis. It's better to use an
> alternative which does obey the triangle inequality - e.g. the Cosine
> metric.
>
> Regards,
> Chris Earnshaw
>
>
> On Thu, 20 Sep 2018 at 21:55, James T. Metz via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> RDkit Discussion Group,
>>
>> I note that RDkit can perform Butina clustering.  Given an SDF of
>> small molecules I would like to cluster the ligands, but obtain additional
>> information from the clustering algorithm.  In particular, I would like
>> to obtain
>> the cluster number and Tanimoto distance from the centroid for every
>> ligand
>> in the SDF.  The centroid would obviously have a distance of 0.00.
>>
>> Has anyone written additional RDkit code to extract this additional
>> information?
>> Thank you.
>>
>> Regards,
>> Jim Metz
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] another request for feedback on a new python API documentation format

2018-05-02 Thread David Cosgrove

Hi Greg,
After a quick poke about, I think the new documentation looks great in
general.  If a change is forced on you, then I suggest you just do it in a
way that makes your life as easy as possible.  If people don't like it,
they can always put the effort in to do something different and then I
expect they'll quickly come round to realising that your way is perfectly
fine.  One way of fixing the docstring formatting would be to put
instructions and a couple of examples somewhere handy and ask people to fix
problems when they encounter them as they read the docs.  That should be a
small effort from each person that would hopefully fix the important ones
quickly in a self-prioritising manner.
Thanks for putting the time into this,
Dave


On Wed, May 2, 2018 at 8:40 AM, Greg Landrum <greg.land...@gmail.com> wrote:

> Dear all,
>
> Just over a year ago I asked for feedback on a new documentation format
> for the RDKit python API: https://www.mail-archive.
> com/rdkit-discuss@lists.sourceforge.net/msg06688.html
> Some useful feedback came in on that thread (thanks to those who replied
> there and in private email), but I ran out of time/motivation to spend time
> on this.
>
> With my motivation recharged thanks to the "fun" of using epydoc to
> document the last release, I revisited the topic this weekend and actually
> made some progress.[1] I'd like to gather a second round of feedback on
> that.
>
> The documentation is here:
> http://rdkit.org/docs_temp/index.html
> The API docs (which are where the biggest changes are) are here:
> http://rdkit.org/docs_temp/api-docs.html
>
> To address some of the things raised last time:
> - This really isn't optional. It's been more than a decade since epydoc
> was updated and it requires python 2.7.
> - My previous attempt to auto-generate docs used pdoc (https://github.com/
> BurntSushi/pdoc). That project also seems to have died, so it's not
> really an option.
> - Based upon the two factors above I decided to use the autodoc
> functionality that's part of Sphinx. It's not perfect, but it's supported
> (and seems likely to continue to be so since it's part of Sphinx)
>
> - The docs now have a search box
>
> - We've lost the overview (list of classes/functions/etc) that epydoc
> provides. There likely is a way to do this with sphinx, but I haven't
> managed to get it to work yet
>
> - Formatting: Some of the docstrings end up looking pretty good, others
> are awful. Here's a module that demonstrates both sides of the coin:
> http://rdkit.org/docs_temp/source/rdkit.Chem.AtomPairs.
> Pairs.html#module-rdkit.Chem.AtomPairs.Pairs
> Fixing this is "just" a matter of editing the doc strings. This is
> reasonably mechanical, but unfortunately not automatable, work. It should
> be done, but in the meantime the broken docstrings aren't completely
> useless.
>
> There's also a github issue for this:
> https://github.com/rdkit/rdkit/issues/1656
> I'm doing the work on this branch:
> https://github.com/greglandrum/rdkit/tree/dev/usinx_sphinx_autodoc
>
> -greg
> [1] Remember how I said I was going to take a short break and do something
> fun? This isn't that.
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-23 Thread David Cosgrove

rmation leakage.
>>
>> The code is at the bottom of this email. It depends on the commercial
>> version of chemfp.
>>
>>
>> > It seems the evolutionary/genetic algorithm approach is the current
>> state-of-the-art for decoding circular/ECFP-like fingerprints.
>>
>> Dave Cosgrove mentioned Dave Weininger's GA work, which means it was with
>> Daylight hash fingerprints. I don't think we know that GAs have ever been
>> used to reverse engineer circular fingerprints.
>>
>>
>> > Historical question for you since you're the closest we have to a
>> chem-informatician historian. :-) Why did these circular/ECFP fingerprints
>> come into existence?
>>
>> I believe you are asking for
>> https://pubs.acs.org/doi/abs/10.1021/ci100050t .
>>
>>   Extended-connectivity fingerprints (ECFPs) are a novel class of
>> topological
>>   fingerprints for molecular characterization. Historically, topological
>>   fingerprints were developed for substructure and similarity searching.
>>   ECFPs were developed specifically for structure−activity modeling.
>>
>> > my reading of the current literature is that tree/dendritic are
>> statistically just as good at virtual screening as circular/ECFP:
>>
>> Yeah, I don't go there. I leave concepts like "just as good" or "better"
>> to people who have experimental data they can use for the comparison.
>>
>>
>> Andrew
>> da...@dalkescientific.com
>>
>> == Code to find which Morgan fingerprints contain a phenol substructure ==
>>
>> import chemfp
>> from chemfp import bitops, search
>>
>> arena = chemfp.load_fingerprints("chembl_23_morgan.fps", reorder=False)
>> print("Fingerprint type:", arena.metadata.type)
>>
>> # Want to find structures containing phenol
>>
>> # Adjust the fingerprint type to limit it to the given atoms
>> fptype = chemfp.get_fingerprint_type(arena.metadata.type + "
>> fromAtoms=3,4,5,6")
>> query_fp = fptype.parse_molecule_fingerprint("*c1ccc(O)cc1", "smi")
>>
>> print("Query fingerprint:")
>> print(bitops.hex_encode(query_fp))
>> print()
>>
>> # Find the matching fingerprints
>> result = search.contains_fp(query_fp, arena)
>>
>> circular_ids = set(result.get_ids())
>>
>> # Search the first 100,000 structures
>> from rdkit import Chem
>> from chemfp import rdkit_toolkit as T
>>
>> pat = Chem.MolFromSmarts("*c1ccc(O)cc1")
>> all_ids = set()
>> exact_ids = set()
>> with T.read_molecules("/Users/dalke/databases/chembl_23.sdf.gz") as
>> reader:
>> for mol in reader:
>> id = mol.GetProp("_Name")
>> all_ids.add(id)
>> if mol.HasSubstructMatch(pat):
>> exact_ids.add(id)
>> if len(all_ids) == 10:
>> break
>>
>> # limit the circular ids to only those checked
>> print("Full screen:", len(circular_ids))
>> circular_ids = circular_ids & all_ids
>> print("Relevant screen:", len(circular_ids))
>>
>> print("#correct:", len(exact_ids & circular_ids))
>> print("#false positives:", len(circular_ids - exact_ids))
>>
>> ## I get the following:
>> # Fingerprint type: RDKit-Morgan/1 radius=2 fpSize=2048 useFeatures=0
>> useChirality=0 useBondTypes=1
>> # Query fingerprint:
>> #
>> 000
>> #
>> 000
>> #
>> 028
>> #
>> 000
>> #
>> 0200100
>> #
>> 00400440002
>> # 00
>> # Full screen: 31134
>> # Relevant screen: 2216
>> # #correct: 2216
>> # #false positives: 0
>>
>>
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
>
> --
> Brice HOFFMANN
> Senior Scientist,
> Molecular Modeling & Computational Chemistry
> iktos.ai
> 24 rue chaptal 75009 Paris
>
>
>
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread David Cosgrove

Hi Jeff,
What you say is theoretically correct, in that it is probably not possible
to go from the fingerprint directly to a structure. However, it is possible
to generate structures and rapidly compare them to the target fingerprint.
The fingerprints are of course able to tell you how close your structure is
to the target fingerprint in a way that can drive an optimisation
algorithm. Chemistry adds strong constraints to what structures are
possible, which reduces the search space dramatically and if you know it’s
a “drug-like” molecule you’re looking for, even more so.
People forget that Daylight originally developed fingerprints to speed up
substructural searching of databases. A structure can only be a
substructure of another molecule if all the bits it sets are also in the
other molecule. They are specifically designed to encode the molecular
structure, and that’s why a GA can be successful. As Peter says, the same
fingerprint can be generated for different molecules, but this will be rare
if the fingerprint is well designed. Try it on Chembl with an RDKit
fingerprint and I’ll be surprised if you get more than 10 pairs that aren’t
isomers of each other or something trivial like that.
Regards,
Dave

On Fri, 20 Apr 2018 at 18:49, Peter S. Shenkin <shen...@gmail.com> wrote:

> Well, @jeff, there's no law saying that hashes must collide, and in fact
> some are designed to make collision extremely unlikely (can you say
> "SHA-2"?). But the ones in question here do collide relatively frequently,
> for at least some molecular fingerprint types.
>
> An interesting question (maybe only to me :-) ) would be how similar, in
> general, the structures are that exhibit identical fingerprints, for the
> well-known fingerprint types, for various fingerprint lengths. A
> sufficiently complicated molecule will give lots of on bits, and for (say)
> a 64-fit fingerprint, there can only be 64 possible fingerprints with all
> but one bit turned on.
>
> I realize that most fingerprints in common use today are longer than this,
> but still, looking back at 64- and 32-bit fingerprints with all but one
> bits on might give some insight. How short does a fingerprint of some
> particular type have to be for, say, 10% of CHEMBL molecules to exhibit an
> all-on pattern? How short does it have to be for, say, 10% of CHEMBL
> molecules to have an exact fingerprint match with some other molecule?
>
> -P
>
> On Fri, Apr 20, 2018 at 1:03 PM, jeff godden <jgod...@gmail.com> wrote:
>
>> Long ago molecular fingerprints were referred to in the literature as
>> molecular hash functions. (y'know, those crazy mathematical algorithms
>> which permitted rapid lookup of some string in a lookup table)  As such, we
>> expected for their to be the associated hash collisions  (
>> https://en.wikipedia.org/wiki/Hash_table#Collision_resolution ).  All
>> this by way of saying that to go from fingerprint to the molecular
>> structure which produced it is traditionally impossible unless the
>> fingerprint no longer amounts to a hash(ing) function.
>> --
>> j
>>
>>
>> On Fri, Apr 20, 2018 at 9:56 AM, Peter S. Shenkin <shen...@gmail.com>
>> wrote:
>>
>>> Isn't it the case that more than one molecule can share an identical
>>> fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl,
>>> extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could
>>> keep going and come up with multiple matches, plus multiple near-misses.
>>>
>>> -P.
>>>
>>> On Fri, Apr 20, 2018 at 10:58 AM, David Cosgrove <
>>> davidacosgrov...@gmail.com> wrote:
>>>
>>>> Hi Brian,
>>>> Dave Weininger once showed a fairly simple GA that could generally
>>>> deduce a structure from a daylight fingerprint by using SMILES strings as
>>>> the chromosomes and tanimoto distance to the target fingerprint as the
>>>> fitness function.  He may have done a talk about it for MUG or conceivably
>>>> written it up. It’d be in JCICS if so, I expect.
>>>>
>>>> You could probably knock up a script to do that in a couple of hours I
>>>> would think using a GA library to do the mechanics. If you’re not worried
>>>> about high efficiency, you don’t need to do anything fancy with mutation
>>>> and crossover of the SMILES strings to ensure you always get a valid
>>>> molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
>>>> like what you give it.
>>>> HTH,
>>>> Dave
>>>>
>>>>
>>>> On Fri, 20 Apr 2018 at 14:45, Nils Weskamp <nils.wesk...@gmail.com>
>>>> wrot

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread David Cosgrove

Hi Brian,
Dave Weininger once showed a fairly simple GA that could generally deduce a
structure from a daylight fingerprint by using SMILES strings as the
chromosomes and tanimoto distance to the target fingerprint as the fitness
function.  He may have done a talk about it for MUG or conceivably written
it up. It’d be in JCICS if so, I expect.

You could probably knock up a script to do that in a couple of hours I
would think using a GA library to do the mechanics. If you’re not worried
about high efficiency, you don’t need to do anything fancy with mutation
and crossover of the SMILES strings to ensure you always get a valid
molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
like what you give it.
HTH,
Dave


On Fri, 20 Apr 2018 at 14:45, Nils Weskamp <nils.wesk...@gmail.com> wrote:

> Hi Brian,
>
> in general, it might be difficult to come up with a deterministic
> algorithm that generates exactly one structure for a given fingerprint due
> to many ambiguities in the process. If you are happy with a more "fuzzy"
> (approximate / probabilistic) approach, you might want to take a look at
>
> https://pubs.acs.org/doi/abs/10.1021/ci600383v
> https://link.springer.com/article/10.1007/s10822-005-9020-4
>
> Given this task, I would probably start with a large database of known
> compounds (PubChem, UniChem, GDB17), calculate fingerprints and then do a
> similarity search with my query fingerprint.
>
> Hope this helps,
> Nils
>
>
> On Fri, Apr 20, 2018 at 3:13 PM, Brian Cole <col...@gmail.com> wrote:
>
>> Hi Chem-informaticians:
>>
>> I know it has been talked about in the community that fingerprints are
>> not a way to obfuscate molecules for security, but I don't recall a paper
>> actually demonstrating actual reverse engineering a fingerprint into a
>> chemical structure. Does anyone know if such a paper exists?
>>
>> Code using RDKit to demonstrate the functionality would be an obvious
>> bonus as well. :-)
>>
>> Thanks,
>> Brian
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] GetBestRMS and the carboxylic acid

2018-04-18 Thread David Cosgrove

Hi Martin,
Sorry, I forgot to 'Reply All' last night.  I think you need to do a bit
more work than just map the carboxylate groups.  Phenol maps onto itself
twice, going either way round the phenyl ring.  GetBestRMS does this for
you, and reports the lower RMS.  For benzoic acid, adding the symmetry for
the carboxylate groups, you will thus need 4 mapping lists.  So first you
need to do a substructure query for the probe against the reference, to
produce 1 or more mapping lists.  You then need to do a substructure match
for carboxylate against, say, the probe and for each of the original match
lists, produce an extra copy and change the mapping of the carboxylate
atoms.  If there is more than one 1 carboxylate, you'll need to do it for
each one, so for 1,4 benzene dioic acid you should end up with 8 mapping
lists.
Hope that helps
Dave


On Tue, Apr 17, 2018 at 8:09 PM, Martin Watson <
martin.wat...@c4xdiscovery.com> wrote:

> Thanks very much Dave,  I think that makes sense.
>
> I need to formulate a substructure match for carboxyl and use that to
> generate the atom lists.  I'll have a crack at that.
>
> *Kind Regards*
>
> Dr. Martin Watson
> VP Structural Analysis
> C4X Discovery Ltd
>
> martin.wat...@c4xdiscovery.com
> +44 (0)7753 434535
> +44 (0)161 235 5085
>
> *www.c4xdiscovery.com <http://www.c4xdiscovery.com>*
>
> On 17 April 2018 at 19:40, David Cosgrove <davidacosgrov...@gmail.com>
> wrote:
>
>> Hi Martin,
>> The documentation for the function, http://www.rdkit.org/Python_Do
>> cs/rdkit.Chem.rdMolAlign-module.html#GetBestRMS, says that you can pass
>> an optional list of lists of atom maps, which map the atoms in one molecule
>> to their corresponding ones in the other.  You could use that to give the
>> two ways of mapping the carboxylate oxygens of the probe onto the
>> reference.  It doesn't say so, but I expect you'll need the full list of
>> atom correspondences. For for 2 conformers of acetic acid, CC(=O)O,
>> numbered so the O atoms are 2 and 3, you'd want something like [[(0,0),
>> (1,1), (2,2), (3,3)], [(0,0), (1,1), (2,3), (3,2)]].  It will take a few
>> lines of code, that I don't have time to bash together for you this
>> evening, that will match one molecule/conformer onto the other, then
>> identify the carboxylate groups and generate the requisite combinations.
>> It doesn't look as though you'll get the combination that produced the best
>> RMS out, though if you passed the maps in one at a time you could discover
>> that explicitly.
>> Hope that helps,
>> Dave
>>
>> On Tue, Apr 17, 2018 at 4:46 PM, Martin Watson <
>> martin.wat...@c4xdiscovery.com> wrote:
>>
>>> Update, the explicit or not charge is not relevant. that just determines
>>> the behaviour of the AssignBondOrderstoTemplate.  The behaviour is that
>>> GetBestRMS does not consider a carboxylate as symmetric.  Can anyone
>>> suggest a workaround to treat as symmetric?
>>>
>>> *Kind Regards*
>>>
>>> Dr. Martin Watson
>>> VP Structural Analysis
>>> C4X Discovery Ltd
>>>
>>> martin.wat...@c4xdiscovery.com
>>> +44 (0)7753 434535
>>> +44 (0)161 235 5085
>>>
>>> *www.c4xdiscovery.com <http://www.c4xdiscovery.com>*
>>>
>>> On 17 April 2018 at 14:21, Martin Watson <martin.wat...@c4xdiscovery.com
>>> > wrote:
>>>
>>>> Hi
>>>>
>>>> I'm using GetBestRMS to score conformers in an sdf relative to a pdb
>>>> extracted sdf ligand using the snippet below. I get odd behaviour when the
>>>> molecule includes a carboxylic acid.  depending on whether the charge is
>>>> explicitly defined in the sdf or not I get a different RMS which seems to
>>>> be that the symmetry of the carboxyl is not considered by GetBestRMS is
>>>> that so?  And if not can anyone suggest a workaround.  (FWIW I have
>>>> examples which "work" when the charge is supplied and "fail" if not as well
>>>> as vice versa)
>>>>
>>>> import sys
>>>> from rdkit import Chem
>>>> from rdkit.Chem import AllChem, rdMolAlign
>>>>
>>>> # The ensemble sdf conformations with hydrogens
>>>> suppl = Chem.SDMolSupplier(sys.argv[2],sanitize=False)
>>>> count=0
>>>>
>>>> #compare them to ref pdb extract no hydrogens
>>>> for molh in suppl:
>>>>count=count+1
>>>>mol1 = Chem.MolFromMolFile(sys.argv[1])
>>>>mol = Chem.RemoveHs(molh)
>>>>mol = AllChem.AssignBond

Re: [Rdkit-discuss] Diversity picker

2018-01-04 Thread David Cosgrove

Hi Jubilant,
If you cluster your compounds using the Buttina-Taylor method at a
threshold of 0.7 then the cluster seeds or centroids will be at least 0.7
apart which is the effect you are looking for. There is no need to specify
the number of clusters in advance.
Regards,
Dave


On Wed, 3 Jan 2018 at 19:54, Sundar <jubilantsun...@gmail.com> wrote:

> Hi RDKit users,
>
> Is it possible to pick a subset of (diverse) compounds that have less than
> a particular Tanimoto coefficient (for eg. 0.7) from a larger set using
> RDKit.
> The current version of the Diverse Picker picks a diverse set based on a
> "number of compounds" instead of Tanimoto score.
>
> Thanks
> Jubilant
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread David Cosgrove

Hi Jim,
Would it not be easier to use a recursive SMARTS, so that you only count
the carbon atoms? Something like [$([C,c]Cl)]-,=,:[$([C,c]Cl)], or, more
compactly [$([#6]Cl)]~[$([#6]Cl)].  I haven't tested these, as I'm not
close to a suitably equipped computer, but you should be able to get the
gist at least.  The Cl is only defining the sort of C you're after so you
won't have to deal with multiple Cl matches on the same atom.
Dave


On Wed, Nov 8, 2017 at 7:08 AM, Greg Landrum <greg.land...@gmail.com> wrote:

> Jim,
>
> I'm a bit confused by what you're trying to do.
>
> Maybe we can try simplifying. What would you like to have returned for
> each of these SMILES:
> 1) ClC=CCl
> 2) ClC(Cl)=CCl
> 3) ClC(Cl)=C(Cl)Cl
>
> If the answer is the same between 1) and 2), but different for 3), then
> the next question will be: "why?"
>
> -greg
>
>
> On Wed, Nov 8, 2017 at 12:38 AM, James T. Metz via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> RDkit Discussion Group,
>>
>> I have written a SMARTS to detect vicinal chlorine groups
>> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>>
>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>
>> I am trying to count the number of ("unique") occurrences of this
>> pattern.
>>
>> For some molecules with symmetry, this results in
>> over-counting.
>>
>> For the molecule, smiles1 below, I want to obtain
>> a count of 1 i.e., 1 tuple of 4 atoms.
>>
>> smiles1 = 'ClC(Cl)CCl'
>>
>> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
>> Beginning with a MOL file representation of smiles1, I get
>>
>> ((1,2,4,3), (0,2,4,3))
>>
>> One possible solution is to somehow merge the two tuples according
>> to a "rule."  One rule that works is "if 3 of the atom indices are the
>> same,
>> then combine into one tuple."
>>
>> However, the rule needs a bit of modification for more complicated
>> cases (higher symmetry).
>>
>> Consider
>>
>> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>>
>> My goal is to get 2 tuples of 4 atoms for smiles2
>>
>> smiles2 is somewhat tricky because there are either
>> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
>> tuples depending on how you choose your 3 atom indices.
>>
>> Again, if my goal is to get 2 tuples, then I need to somehow
>> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
>> operation which will give me 2 remaining groups (desired).
>>
>> I have already checked stackoverflow and a few other places
>> for PYTHON code to do the necessary merging, but I could not
>> find anything specific and appropriate.
>>
>> I would be most grateful if anyone has ideas how to do this.  I
>> suspect the answer is a few lines of well-written PYTHON code,
>> and not modifying the SMARTS (I could be mistaken!).
>>
>> Thank you.
>>
>> Regards,
>> Jim Metz
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread David Cosgrove

Hi Chris,
Sure they're equivalent, but with my suggestion you don't have to create
all 6 different SMARTS patterns, which whilst not difficult is likely to be
prone to silly errors.  You can stick a long list of OR'd vector bindings
together to put in all the exclusions you want on each atom as you think of
them.
Dave


On Sun, Sep 24, 2017 at 5:15 PM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> Hi
>
> It amounts to the same thing - either do all tests on one atom, or one
> test on all atoms.
>
> The syntax is shorter for the latter if you can use the vector bindings
> but may not be otherwise, especially if multiple exclusions are needed.
>
> Regards,
> Chris Earnshaw
>
>
>
> On 24 Sep 2017 16:54, "David Cosgrove" <davidacosgrov...@gmail.com> wrote:
>
> Hi,
> I think Chris' solution is a bit overly complicated, though I haven't
> tested my alternative.  If each atom in the ring is tested for
> '[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the
> vector bindings I provided previously, then I don't think you need to
> provide the SMARTS for the excluded ring starting from each atom.  So long
> as 1 of the atoms in the ring fails the test, the whole ring fails, so you
> just need the same test on each atom.
> Dave
>
>
> On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>
>> Hi Jim
>>
>> The key thing to remember about the recursive SMARTS clauses is that
>> they only match one atom (the first), and the rest of the string
>> describes the environment in which that atom is located. So the clause
>> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
>> embedded in the rest of the ring system. We then negate that with the
>> ! symbol.
>>
>> If we use just the recursive SMARTS expression '[$(a)]' (or the simple
>> SMARTS 'a'), it can match any of the six aromatic atoms in the
>> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
>> means this atom can't match the nitrogen substituted by aliphatic
>> C,but it can still match any of the other five aromatic atoms.
>> Consequently there are five more exclusion clauses to add, each of
>> which starts with a different one of the aromatic atoms in your
>> undesired structure. As long as one of the atoms in the full SMARTS is
>> prevented from matching any of the atoms in the undesired structure in
>> this way, then the overall match is prevented.
>>
>> Adding an exclusion for pyridine is then easy. We're already excluding
>> six patterns, and (considering symmetry) we only need to add four more
>> to exclude all pyridines. Appending
>> ';!$(n1c1);!$(c1n1);!$(c1cnccc1);!$(c1ccncc1)' inside the
>> square brackets should do the trick.
>>
>> You're quite right though, this gets pretty cumbersome very quickly
>> and it may well be best to handle it in code with simple include /
>> exclude SMARTS patterns. You'll have to think about checking which
>> atoms have been matched - for example, do you want to match quinoline
>> because it contains a benzene ring, or exclude it because it contains
>> a pyridine? If the former you'll have to check that the atoms matched
>> by your two patterns are different.
>>
>> Hope this helps!
>>
>> Chris Earnshaw
>>
>> On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
>> > Chris,
>> >
>> > Wow! Your recursive SMARTS expression works as needed!
>> >
>> > Hmmm... Help me understand this better ... it looks like you "walk
>> around"
>> > the
>> > ring of the substructure we want to exclude and employ a slightly
>> different
>> > recursive SMARTS beginning at that atom.  Is that correct?
>> >
>> > Also, since my situation is likely to get more complicated with respect
>> to
>> > exclusions, suppose I still wanted to utilize the general aromatic
>> > expression
>> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
>> exclude
>> > the structures we have been discussing, and I also wanted to exclude
>> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
>> >
>> > Is there a SMARTS expression that would capture 2 exclusions?
>> >
>> > Perhaps this is getting too clumsy!  It might be better to have one or
>> more
>> > inclusion SMARTS and one or more exclusion SMARTS, and write code
>> > to remove those groups of atoms that are coming from the exclusion
>> SMARTS.
>> >
>> > Any ideas for PYTHON/RDkit code?  Something like
>>

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread David Cosgrove

:1:a:a:a:a:a:1,
> > with recursive SMARTS applied to the first atom to ensure that this
> > can't match any of the 6 ring atoms in your undesired system.
> >
> > Regards,
> > Chris Earnshaw
> >
> > On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
> > <rdkit-discuss@lists.sourceforge.net> wrote:
> >> Hello,
> >>
> >> Suppose I have the following molecule
> >>
> >> m = 'CN1C=CC(=O)NC1=O'
> >>
> >> I would like to be able to use a SMARTS pattern
> >>
> >> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
> >>
> >> to recognize the 6 atoms in a typical aromatic ring, but
> >> I do not want to recognize the 6 atoms in the molecule,
> >> m, as aromatic. In other words, I am trying to write
> >> a specific exclusion.
> >>
> >> Is it possible to modify the SMARTS pattern to
> >> exclude the above molecule? I have tried using
> >> recursive SMARTS, but I can't get the syntax to
> >> work.
> >>
> >> Any ideas? Thank you.
> >>
> >> Regards,
> >> Jim Metz
> >>
> >>
> >>
> >>
> >> ----
> --
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>



-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread David Cosgrove

kit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Clustering

2017-06-05 Thread David Cosgrove

Hi,
I have used this algorithm for many years clustering sets of several
millions of compounds.  Indeed, I am old enough to know it as the Taylor
algorithm.  It is slow but reliable.  A crucial setting is the similarity
threshold for the clusters, which dictates the size of the neighbour lists
and hence the amount of RAM required.  It also, of course, determines the
quality of the clusters.  My implementation is at
https://github.com/OpenEye-Contrib/Flush.git.  This repo has a number of
programs of relevance, the one you want is called cluster.  I have just
confirmed that it compiles on ubuntu 16.  It needs the fingerprints as
ascii bitstrings, I don't have code for turning RDKit fingerprints into
this format, but I would imagine it's quite straightforward.  The program
runs in parallel using OpenMPI.  That's valuable for two reasons.  One is
speed, but the more important one is memory use.  If you can spread the
slave processes over several machines you can cluster much larger sets of
molecules as you are effectively expanding the RAM of the machine.  When I
wrote the original, 64MB was a lot of RAM, it is less of an issue these
days but still matters if clustering millions of fingerprints.  Note that
the program cluster doesn't ever store the distance matrix, just the lists
of neighbours for each molecule within the threshold.  This reduces the
memory footprint substantially if you have a tight-enough cluster threshold.
HTH,
Dave

On Mon, Jun 5, 2017 at 11:22 AM, Nils Weskamp <nils.wesk...@gmail.com>
wrote:

> Hi Michal,
>
> I have done this a couple of times for compound sets up to 10M+ using a
> simplified variant of the Taylor-Butina algorithm. The overall run time
> was in the range of hours to a few days (which could probably be
> optimized, but was fast enough for me).
>
> As you correctly mentioned, getting the (sparse) similarity matrix is
> fairly simple (and can be done in parallel on a cluster). Unfortunately,
> this matrix gets very large (even the sparse version). Most clustering
> algorithms require random access to the matrix, so you have to keep it
> in main memory (which then has to be huge) or calculate it on-the-fly
> (takes forever).
>
> My implementation (in C++, not sure if I can share it) assumes that the
> similarity matrix has been pre-calculated and is stored in one (or
> multiple) files. It reads these files sequentially and whenever a
> compound pair with a similarity beyond the threshold is found, it checks
> whether one of the cpds. is already a centroid (in which case the other
> is assigned to it). Otherwise, one of the compounds is randomly chosen
> as centroid and the other is assigned to it.
>
> This procedure is highly order-dependent and thus not optimal, but has
> to read the whole similarity matrix only once and has limited memory
> consumption (you only need to keep a list of centroids). If you still
> run into memory issues, you can start by clustering with a high
> similarity threshold and then re-cluster centroids and singletons on a
> lower threshold level.
>
> I also played around with DBSCAN for large compound databases, but (as
> previously mentioned by Samo) found it difficult to find the right
> parameters and ended up with a single huge cluster covering 90 percent
> of the database in many cases.
>
> Hope this helps,
> Nils
>
> Am 05.06.2017 um 11:02 schrieb Michał Nowotka:
> > Is there anyone who actually done this: clustered >2M compounds using
> > any well-known clustering algorithm and is willing to share a code and
> > some performance statistics?
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Depicting reactions to the same quality as molecules

2017-05-24 Thread David Cosgrove

Hi Greg,
I'll see about implementing this. I'll liaise with Ed.
Cheers,
Dave


On Sat, May 20, 2017 at 7:00 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Ed,
>
> This is a weak spot; we haven't yet added decent reaction depiction.
>
> The best that's currently available is to use the old drawing code
> (Draw.ReactionToImage()) and to make sure that you have the cairo libraries
> installed so that you at least have decent drawings. Which operating
> system/version of Python are you using?
>
> -greg
>
>
> On Fri, May 19, 2017 at 5:47 PM, Ed Griffen <ed.grif...@medchemica.com>
> wrote:
>
>> Is there a reaction depiction option similar to the MolDraw2DCairo  which
>> produces much better depictions that the simple Chem.Draw PIL images?
>>
>> Or am I just doing this wrong?
>>
>>
>> Attempting to push a reaction through MolDraw2DCairo fails with:
>>
>> Traceback (most recent call last):
>>   File "drawing_test.py", line 31, in 
>> rc = rdMolDraw2D.PrepareMolForDrawing(rxn)
>> Boost.Python.ArgumentError: Python argument types in
>> rdkit.Chem.Draw.rdMolDraw2D.PrepareMolForDrawing(ChemicalReaction)
>> did not match C++ signature:
>> PrepareMolForDrawing(RDKit::ROMol const* mol, bool kekulize=True,
>> bool addChiralHs=True, bool wedgeBonds=True, bool forceCoords=False)
>>
>> Cheers,
>>
>> Ed
>>
>>
>> sample code below:
>>
>>
>> from rdkit import Chem
>> from rdkit.Chem import AllChem
>> from rdkit.Chem import Draw
>> from rdkit.Chem.Draw import rdMolDraw2D
>> from rdkit.Chem import rdDepictor
>> from rdkit.Chem.Draw import DrawingOptions
>>
>> m1 = AllChem.MolFromSmiles('c1c1N(C)C')
>> tmp = AllChem.Compute2DCoords(m1)
>> Draw.MolToFile(m1,'test_mol_image.png')
>> rdDepictor.Compute2DCoords(m1)
>>
>> rxn = AllChem.ReactionFromSmarts('[C:1](=[O:2])[N:3]>>[N:1][C:3]=[O:2]')
>> rimage = Draw.ReactionToImage(rxn)
>> rimage.save('test_reaction_image.png')
>>
>> mc = rdMolDraw2D.PrepareMolForDrawing(m1)
>> drawer = Draw.MolDraw2DCairo(300, 300)
>> drawer.DrawMolecule(mc)
>> drawer.FinishDrawing()
>> output = drawer.GetDrawingText()
>> with open('test_mol_image_2.png', 'wb') as pngf:
>> pngf.write(output)
>>
>>
>> drawer2 = Draw.MolDraw2DCairo(600, 300)
>> rc = rdMolDraw2D.PrepareMolForDrawing(rxn)
>> drawer2.DrawMolecule(rc)
>>
>>
>>
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-18 Thread David Cosgrove

ts.sourceforge.net/lists/listinfo/rdkit-discuss
>> >
>> >
>> -- next part --
>> An HTML attachment was scrubbed...
>>
>> --
>>
>> Message: 3
>> Date: Tue, 11 Apr 2017 08:35:53 -0500
>> From: Francois BERENGER <francois.c.beren...@vanderbilt.edu>
>> Subject: [Rdkit-discuss] official Tripos MOL2 file format PDF document
>> To: "rdkit-discuss@lists.sourceforge.net"
>> <rdkit-discuss@lists.sourceforge.net>
>> Message-ID: <1f673e0d-0c10-a325-dde7-c28e76e06...@vanderbilt.edu>
>> Content-Type: text/plain; charset="utf-8"; format=flowed
>>
>> Hello,
>>
>> Not directly related to rdkit, but if someone that have
>> the original PDF of this file format could place it
>> online permanently, that would be wonderful.
>>
>> The official URL at tripos.com is dead since quite some time
>> apparently.
>> And that's bad because it's a quite popular file format
>> and its specification should be permanently archived.
>>
>> Thanks a lot,
>> Francois.
>>
>>
>>
>> --
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>> --
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>> End of Rdkit-discuss Digest, Vol 114, Issue 8
>> *
>>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] delete a substructure

2017-03-11 Thread David Cosgrove

There's a bit more to it than that. If you're developing a SMARTS for a
particular type of group, you can run it against a large file and see the
false hits and false non-hits quickly and revise your SMARTS accordingly.
And as you say, it is also available.
Dave


On Fri, 10 Mar 2017 at 20:53, Peter S. Shenkin <shen...@gmail.com> wrote:

> Sounds like Daylight's "depictmatch", unfortunately no longer available on
> line
>
> -P.
>
> On Fri, Mar 10, 2017 at 1:28 PM, David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
> Hi,
> In the RDKit source, under the 2d drawing code in the c++ part there's the
> full source code for a QT program that will run one or more SMARTS patterns
> against a set of molecules, split any matches and non-matches into 2
> displays side by side and colour the atoms that the SMARTS match. It needs
> a bit of persistence to compile and has only been tried on Linux but is
> very helpful for writing new SMARTS. If there's interest, when I have a bit
> of spare time over the next few weeks I can make sure it's easier to
> compile. If you poke about in my website (cozchemix.co.uk) you'll find a
> link to my GitHub repo with an earlier version which has been compiled
> under Linux recently and has instructions. Sorry not to put links in, I
> don't have access to a computer st the moment, just phone.
>
> Cheers,
> Dave
>
> On Thu, 9 Mar 2017 at 18:41, Chenyang Shi <cs3...@columbia.edu> wrote:
>
> Thank you Chris. I found that one too; it is quite convenient to visualize
> both SMARTS and SMILES strings.
>
> On Thu, Mar 9, 2017 at 11:28 AM, Chris Swain <sw...@mac.com> wrote:
>
> I use SMARTSviewer at Univ of Hamburg
>
> http://www.zbh.uni-hamburg.de/en/bioinformatics-server.html
>
> Chris
>
> On 9 Mar 2017, at 17:21, rdkit-discuss-requ...@lists.sourceforge.net
> wrote:
>
> One last question I have is do you guys have convenient online or local
> documents to look up desired SMARTS.
> Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes
> with the installation of RDKIT.
> Brian suggested daylight website,
> http://www.daylight.com/dayhtml_tutorials/languages/
> smarts/smarts_examples.html, which is a good place as well.
>
> Best,
> Chenyang
>
>
>
>
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
>
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] C++ MolPickler

2017-02-01 Thread David Cosgrove

Hi All,

I've got as far as 'Preserving Molecules' in the 'Getting Started with C++'
document I'm writing, and it appears that the MolPickler doesn't write
properties into the pickle.  Is that right?  If so, it means the molecule
name goes missing, which is an issue when putting multiple molecules in the
same file.

Cheers,
Dave


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] PMI API

2017-01-08 Thread David Cosgrove

Hi Chris,
I can help a bit with the first point - I am currently 'porting' the
getting started in Python bit of the documentation to c++. There's a long
way to go, but if you go to my fork of RDKit at
https://github.com/DavidACosgrove and check out the GetStartedC++ branch,
you can at least use what I've managed so far (
https://github.com/DavidACosgrove/rdkit/blob/GetStartedC%2B%2B/Docs/Book/GettingStartedInC%2B%2B.md).
It's pretty basic stuff that you may already be beyond, but there are some
examples and a CMakeLists.txt file that builds them which might be helpful.


It's probably time I tidied it up (having just looked at it to get the link
above, I see there's a typo on the first sentence, for example!) and sent
in an interim Pull Request as for people starting out it might already be
of value.

Cheers,
Dave

On Sun, 8 Jan 2017 at 10:19, Chris Earnshaw  wrote:

> Hi
>
> A while ago I had a project which needed PMI
>
> descriptors (specifically NPR1 and NPR2) which were not available in the
>
> main branch of RDKit at the time. At the time I used the fork by
>
> 'hahnda6' which provided the calcPMIDescriptors() function, and this
>
> worked well. Now that PMI descriptors are available in the main RDKit
>
> distrubution I thought I'd rewrite my code to use the official version.
>
> Building
>
> the new RDKit was no problem, but things went downhill shortly after
>
> that. There's every chance that I've missed the relevant documentation
>
> (I hope someone can point me in the right direction if so) and done
>
> something stupid!
>
> The issues are -
> 1) I can't find
>
> any documentation of the C++ API - the only reference to PMI in the
>
> online RDKit documentation appears to be to the PMI.h file
> 2)
>
> Having written a program using the PMI[123] and/or NPR[12] functions, I
>
> couldn't get it to compile until I added the  -DRDK_BUILD_DESCRIPTORS3D
>
> directive -
> g++ -o sdf_pmi_blob sdf_pmi.cpp -I/packages/rdkit/include/rdkit
> -L/packages/rdkit/lib -lDescriptors -lGraphMol -lFileParsers
> -Wno-deprecated -O2 -DRDK_BUILD_DESCRIPTORS3D
> This seems a bit odd...
> 3)
>
> Is it necessary to make separate calls to the individual PMI() and/or
>
> NPR() functions? Surely this results in duplication of some of the
>
> heavier calculations? I can't find any equivalent of
>
> calcPMIDescriptors() which returned a 'Moments' struct containing all
>
> the PMI and NPR values in one go.
> 4) The big one! The
>
> returned results look very odd. They appear to relate more to the
>
> dimensions of the molecule than the moments of inertia. For a rod-like
>
> molecule (dimethylacetylene) I'd expect two large and one small PMI
>
> (e.g. PMI1: 6.61651   PMI2: 150.434   PMI3: 150.434  NPR1: 0.0439828
>
> NPR2: 0.98) but actually get PMI1: 0.061647  PMI2: 0.061652  PMI3:
>
> 25.3699  NPR1: 0.002430  NPR2: 0.002430.
> For disk-like (benzene) the
>
> result should be one large and two medium (e.g. PMI1: 89.1448  PMI2:
>
> 89.1495  PMI3: 178.294  NPR1: 0.499987  NPR2: 0.500013) but get PMI1:
>
> 2.37457e-10  PMI2: 11.0844  PMI3: 11.0851  NPR1: 2.14213e-11  NPR2:
>
> 0.33.
> Finally for a roughly spherical molecule (neopentane) the
>
> NPR values look reasonable (no great surprise) but the absolute PMI
>
> values may be too small: old program - PMI1: 114.795  PMI2: 114.797
>
> PMI3: 114.799
> NPR1: 0.66  NPR2: 0.88, new program - PMI1: 6.59466  PMI2:
> 6.59488  PMI3: 6.59531  NPR1: 0.02  NPR2: 0.35
>
> As
>
> I say, it's entirely likely that I'm doing something stupid here so any
>
> pointers will be gratefully received. FWIW, the core of my program is -
> mol = MolBlockToMol(ctab, true, false);
> double pmi1 = RDKit::Descriptors::PMI1(*mol);
> double pmi2 = RDKit::Descriptors::PMI2(*mol);
> double pmi3 = RDKit::Descriptors::PMI3(*mol);
> double npr1 = RDKit::Descriptors::NPR1(*mol);
> double npr2 = RDKit::Descriptors::NPR2(*mol);
>
> Thanks for any help!
> Chris
>
>
> 
> --
>
> Check out the vibrant tech community on one of the world's most
>
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot__
> _
>
> Rdkit-discuss mailing list
>
> Rdkit-discuss@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] MolToSmiles

2016-12-18 Thread David Cosgrove

Hi Jean-Marc,

There is a property of the molecule created when it is read that contains
this information.  I forget what it is called, but if you call the
molecule's GetPropNames function you should see something obvious in the
values returned.  You can then call GetProp with that property name to get
a string containing the canonical atom order.  Note that string is a string
representation of the Python list, with '[' at the start, ']' at the end,
and commas in between. You'll need to manipulate it a bit to release the
array of integers you need.

Cheers,
Dave

On Sun, Dec 18, 2016 at 5:19 PM, Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Hi all,
>
> maybe my question has been already been answered:
> when converting from Mol to a canonical SMILES string,
> is there a way to obtain the mapping between the atom indexes in the
> Mol object and the atom indexes in the SMILES chain?
>
> All the best,
>
> Jean-Marc
>
> --
>
> Dr. Jean-Marc Nuzillard
> Institute of Molecular Chemistry
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 33 3 26 91 82 10
> Fax :33 3 26 91 31 66
> http://www.univ-reims.fr/ICMR
>
> http://eos.univ-reims.fr/LSD/
> http://eos.univ-reims.fr/LSD/JmnSoft/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread David Cosgrove

Hi Alexis,

While you're wrestling with the difference between () and CC(C)C you
could also consider that . in a SMILES is valid, and denotes a mixture, for
example CCO.O.O (for vodka, maybe).  You might get those in FDA documents
that discuss formulations, for example.  In a well scanned and punctuated
document, you should be able to distinguish '. ' for the end of a sentence
from '.' for a mixture but I don't think you'd have to be too unlucky for
some to creep through.

Regards,
Dave


On Mon, Dec 5, 2016 at 2:28 PM, Alexis Parenty <
alexis.parenty.h...@gmail.com> wrote:

> Oups! Thanks Brian and Igor! I did not understand at first the punctuation
> issues referred yesterday by Andrew with smiles that could be quoted inside
> parenthesis or at the end of a sentence next to a full stop or a semi-col.
> I see it now. I should remove the punctuation filter.
>
>
> For the parenthesis issue, the difficulty is to differentiate the SMILES
> formats (xxx)(xxx) from this one (xxx)… I will try and address
> that using something like:
>
>
> Mol = Chem.MolFrom(smiles)
>
> If smiles[0] in ‘({[\’\”’ and smiles[-1] in  ‘)}]\’\”’ and Mol is None:
>
> Mol= Chem.MolFrom(smiles[1:-1])
>
>
> Anything better?
>
>
>
> Andrew, no, Alice’s adventure in wonderland is not really representative
> of the text I need to extract my SMILES from (FDA Regulatory documents!)
> I’ll see how it performs on the real stuff and might adjust the script
> further if needed.
>
> Thanks Andrew for the generator comprehension example (I know they exist
> and are faster than typical loops, but I can never figure out how they
> work…) I am still on the learning curve… I’ll add it to the final version.
>
>
> Markus, the valid SMILES found in Alice’s wonderland is the following
>  “*” which is the linear structure:
> "Any-Any-Any-Any-Any-Any-Any..." !!! Not a company secret I’m afraid!
>
>
> Thanks again
>
>
> On 5 December 2016 at 14:23, Brian Kelley  wrote:
>
>> Cool!  Btw-  try sanitize=False
>>
>> Also, Andrew is right that you will miss parenthetical phrases.  I.e.
>> Benzene(c1c1) and the like, just reasserting that this is a hard
>> problem!
>>
>> 
>> Brian Kelley
>>
>> On Dec 5, 2016, at 5:35 AM, Alexis Parenty 
>> wrote:
>>
>> Dear All,
>>
>> Many thanks to everyone for your participation in that discussion. It was 
>> very interesting and useful. I have written a small script that took on 
>> board everyone’s input:
>>
>> This incorporates a few "text filters" before the RDKit function: First of 
>> all I made a dictionary of all the words present in the text as a Key, and 
>> the number of times
>>
>> they appear in the text as values. Then I removed from the list of unique 
>> keys (words) all the ones that were repeated more than once (because I know 
>> that my SMILES
>>
>> appear only once in each document). Then I remove all the words that are 
>> shorter than 5 letters because I know that all my structures contain more 
>> than 5 atoms
>>
>> and I want to remove possible FPs coming from “I” or “CC” for example. Then, 
>> with regex, I removed all unique words that contain letter that are not in 
>> the main
>>
>> periodic table of element and remove the words that contain the main English 
>> punctuation signs that never happen in SMILES.
>>
>> Placed one after the others, those filters take 26 836 words of the book 
>> "Alice's adventure in the wonderland" down to 780 words. (97% of words 
>> filtered out)
>>
>>
>> TEST RESULTS
>>
>> I have tested my script on:
>> •7900 unique SMILES for “drug-like molecules”
>> •Alice’s adventure in wonderland (I never read the book but I assumed 
>> there is no SMILES!)
>> •A shuffled mixture of Alice’s in wonderland and 7900 unique SMILES
>>
>> The performance is as follow:
>>
>>
>> For Alice’s adventure in wonderland:
>> 26836 words
>> 26835 TN
>> 0 TP
>> 1 FP: “*” 
>> (actually a valid SMILES…)
>> 0 FN
>>
>> ==> Accuracy of 0.6, in 0:00:00.112000
>>
>>
>>
>> For 7900 unique SMILES from unique drug like molecules
>> 7900 TP
>> 0 TN
>> 0 FP
>> 0 FN
>> ==> Accuracy of 0.6, in 0:00:04.20
>>
>>
>>
>>
>> 7900 unique SMILES from unique drug like molecule shuffled within ALICE'S 
>> ADVENTURES IN WONDERLAND 26836 words (34736 word in totals)
>>
>> 7900 TP
>> 26835 TN
>> 1 FP: “*”
>> 0 FN
>>
>> ==> Accuracy of 0.7 in 0:00:04.949000
>>
>>
>> Then, I have reprocessed the txt mixture above without the text filters 
>> (directly feeding every words from the text into the RDKit function and got 
>> the following result:
>>
>> 7900 TP
>> 26835 TN
>> 339 FP
>> 0 FN
>> ==> Accuracy of 0.97 in 0:00:07.893
>>
>>
>> Therefore, as Brian pointed out, the function Chem.MolFromSmiles(SMILES) is 
>> crazy fast to detected non valid smiles, i.e. to return a “None

Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)

2016-11-18 Thread David Cosgrove

As Greg says, this is a large area and somewhat of a diversion from my
original intention. All I was asking for was a set of test cases so I can
ensure that my port of the original Python code in AllChem.py to C++
behaves correctly. That seems like a sensible first step before embarking
on something more ambitious.

Dave

On Fri, 18 Nov 2016 at 08:17, Greg Landrum  wrote:

> This is a very big topic, and one where I would very much like to improve
> the RDKit. John Mayfield gave a great talk on the issues (and some ideas
> about fixing them based on his work with the CDK) at the UGM that some of
> you may find interesting :
>
> https://github.com/rdkit/UGM_2016/blob/master/Presentations/JohnMayfield_Depiction.pdf
>
> Fixing the larger problems is a *lot* of work and not something that is
> likely to happen quickly, but there is some low-hanging fruit (like cutting
> crossed bonds) that I ought to be able to do something about.[1]
>
> -greg
> [1] the trick is to avoid, as much as possible, creating drawings that
> look like Möbius strips.
> _
> From: Peter S. Shenkin 
> Sent: Thursday, November 17, 2016 11:23 PM
> Subject: Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a
> bit off-topic)
> To: 
>
>
>
>
> On 17 Nov 2016, at 4:12 PM, Dimitri Maziuk  wrote:
>
> Philosophically speaking, there must exist molecules for which a legible
> 2D projection is simply not possible.
>
>
> Hi,
>
> I don't think that 2D projection of a 3D structure is an appropriate
> paradigm for 2D depiction, in general. I think of it as being more about 2D
> construction. I don't think camphor is a particularly difficult example,
> though, and I think that the hidden-line elimination (for lack of a better
> term) that Marvin does gives it a leg up on RDKit's representation.
>
> By the way, I do not think that Marvin is the best there is out there;
> it's just what I happen to have available for comparison.
>
> Stereochemistry adds complications, because 3D information has to be
> encoded in some way. Camphor (your suggestion) has a little of this. I gave
> Marvin a non-stereo SMILES and it picked an enantiomer. I drew the same
> enantiomer. I did not specify stereochemistry to RDKit, so, despite the
> visual confusion of the bond crossings, I suppose it's good that it didn't
> depict an explicit enantiomer.
>
> And labels add further complications. The two approaches I've seen for
> labels are using them as the atomic vertices, as RDKit does, and adding
> them adjacent to the vertices. I personally prefer the latter, because to
> my eye, it's easier to see the connectivity without being distracted by the
> labels.
>
> But my philosophical point was that different forms of 2D depiction work
> better for different purposes. Stéphane wants to see sugars drawn as
> carbohydrate chemists are used to seeing them. I would like to see the 2D
> connectivity as clearly as possible and would sacrifice some conventions
> for that purpose. And so on.
>
> -P.
>
>
>
>
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Different default behaviour for Kekulize in Python and C++

2016-11-02 Thread David Cosgrove

Hi All,

As I've been transliterating the GettingStartedInPython to
GettingStartedInC++, I've noticed that you get different default behaviour
from Kekulize in the two languages:

m = Chem.MolFromSmiles('c1c1')
print( 'Order : {}'.format( m.GetBondWithIdx(0).GetBondType() ) )
print( 'Aromatic : {}'.format( m.GetBondWithIdx(0).GetIsAromatic() ) )
Chem.Kekulize(m)
print( 'After default Kekulize : Aromatic : {}'.format(
m.GetBondWithIdx(0).GetIsAromatic() ) )

m1 = Chem.MolFromSmiles('c1c1')
Chem.Kekulize(m1 , clearAromaticFlags=False )
print( 'After Kekulize, clearAromaticFlags False : Aromatic : {}'.format(
m1.GetBondWithIdx(0).GetIsAromatic() ) )

m2 = Chem.MolFromSmiles('c1c1')
Chem.Kekulize(m2 , clearAromaticFlags=True )
print( 'After Kekulize, clearAromaticFlags True : Aromatic : {}'.format(
m2.GetBondWithIdx(0).GetIsAromatic() ) )

gives

Order : AROMATIC
Aromatic : True
After default Kekulize : Aromatic : True
After Kekulize, clearAromaticFlags False : Aromatic : True
After Kekulize, clearAromaticFlags True : Aromatic : False

Whereas the corresponding C++

 RDKit::RWMOL_SPTR mol( new RDKit::RWMol( *RDKit::SmilesToMol( "c1c1" )
) );
  std::cout << "Order : " << mol->getBondWithIdx( 0 )->getBondType() <<
std::endl;
  std::cout << "Aromatic : " << mol->getBondWithIdx( 0 )->getIsAromatic()
<< std::endl;

  RDKit::MolOps::Kekulize( *mol );
  std::cout << "After default Kekulize : Aromatic : " <<
mol->getBondWithIdx( 0 )->getIsAromatic() << std::endl;

  RDKit::RWMOL_SPTR mol1( new RDKit::RWMol( *RDKit::SmilesToMol( "c1c1"
) ) );
  RDKit::MolOps::Kekulize( *mol1 , false );
  std::cout << "After Kekulize, markAtomsBonds false : Aromatic : " <<
mol1->getBondWithIdx( 0 )->getIsAromatic() << std::endl;

  RDKit::RWMOL_SPTR mol2( new RDKit::RWMol( *RDKit::SmilesToMol( "c1c1"
) ) );
  RDKit::MolOps::Kekulize( *mol2 , true );
  std::cout << "After Kekulize, markAtomsBonds true : Aromatic : " <<
mol2->getBondWithIdx( 0 )->getIsAromatic() << std::endl;

gives

Order : 12
Aromatic : 1
After default Kekulize : Aromatic : 0
After Kekulize, markAtomsBonds false : Aromatic : 1
After Kekulize, markAtomsBonds true : Aromatic : 0


I.e. by default the Python version clears the Aromatic flags on the bonds,
the C++ doesn't.  That seemed sufficiently anomalous to point out and
consider whether they should be unified.  Although there's a strong
possibility that that would be a breaking change for people's code.

I attach the full program files for the two versions if you want to
reproduce it.  This is on a recent pull of the github code
(e9af48ffd77c5a219a1671a63704aa815c08b348)
//
// Modifying molecules example9.cpp

#include 

#include 
#include 
#include 

int main( int argc , char **argv ) {

  RDKit::RWMOL_SPTR mol( new RDKit::RWMol( *RDKit::SmilesToMol( "c1c1" ) ) );
  std::cout << "Order : " << mol->getBondWithIdx( 0 )->getBondType() << std::endl;
  std::cout << "Aromatic : " << mol->getBondWithIdx( 0 )->getIsAromatic() << std::endl;

  RDKit::MolOps::Kekulize( *mol );
  std::cout << "After default Kekulize : Aromatic : " << mol->getBondWithIdx( 0 )->getIsAromatic() << std::endl;
 
  RDKit::RWMOL_SPTR mol1( new RDKit::RWMol( *RDKit::SmilesToMol( "c1c1" ) ) );
  RDKit::MolOps::Kekulize( *mol1 , false );
  std::cout << "After Kekulize, markAtomsBonds false : Aromatic : " << mol1->getBondWithIdx( 0 )->getIsAromatic() << std::endl;

  RDKit::RWMOL_SPTR mol2( new RDKit::RWMol( *RDKit::SmilesToMol( "c1c1" ) ) );
  RDKit::MolOps::Kekulize( *mol2 , true );
  std::cout << "After Kekulize, markAtomsBonds true : Aromatic : " << mol2->getBondWithIdx( 0 )->getIsAromatic() << std::endl;

}
#!/usr/bin/env python

from rdkit import Chem

m = Chem.MolFromSmiles('c1c1')
print( 'Order : {}'.format( m.GetBondWithIdx(0).GetBondType() ) )
print( 'Aromatic : {}'.format( m.GetBondWithIdx(0).GetIsAromatic() ) )
Chem.Kekulize(m)
print( 'After default Kekulize : Aromatic : {}'.format( m.GetBondWithIdx(0).GetIsAromatic() ) )

m1 = Chem.MolFromSmiles('c1c1')
Chem.Kekulize(m1 , clearAromaticFlags=False )
print( 'After Kekulize, clearAromaticFlags False : Aromatic : {}'.format( m1.GetBondWithIdx(0).GetIsAromatic() ) )

m2 = Chem.MolFromSmiles('c1c1')
Chem.Kekulize(m2 , clearAromaticFlags=True )
print( 'After Kekulize, clearAromaticFlags True : Aromatic : {}'.format( m2.GetBondWithIdx(0).GetIsAromatic() ) )
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Getting started with C++

2016-09-24 Thread David Cosgrove

Hi All,

I'm contemplating starting a chapter in the documentation called 'Getting
Started with the RDKit in C++' which would mirror the information given in
the Python chapter but with examples in C++ for those of us diehards who
like to program in a compiled language.  As I recall, the learning curve
was quite steep at the beginning, so I thought it would be helpful to ease
others into the real world.

The purpose of this email was just to check that no one else is working on
this already - it would be a shame to duplicate effort.  If so, I will
happily pitch in, if not then I'll crack on and if anyone else wants to
help, please get in touch.

Cheers,
Dave
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] AddHs()

2016-09-10 Thread David Cosgrove

Hi Rocco et al.,
I too found this a very clear explanation of the different classes of
hydrogen so many thanks for taking the time. Where would a chiral H fit in?
The sort of H from Cl[C@H](F)Br?  That one needs to stay even if you
collapse all explicit H atoms  to implicit.

On the subject of the documentation, I would encourage you to find the
GettingStartedWithRDKit.rst in the Docs directory, find somewhere where
this discussion fits, add it, and send the new version to Greg. If everyone
did this every time they spent time working out how to do something, the
documentation would grow very rapidly and by definition grow fastest in
areas that people are actively using. We don't need to wait for Greg to do
it all!  He's busy enough as it is, and let's face it, writing docs is dull
and I'm sure he would appreciate the help.

Cheers,
Dave


On Friday, 9 September 2016, Greg Landrum  wrote:

> Thanks for this writeup Rocco. You're right that there's not an easy to
> find and understand collection of this information. That's one of those
> gaps in the documentation that I should eventually address. This is already
> a pretty good start though.
>
> -greg
>
>
> On Thu, Sep 8, 2016 at 9:37 PM, Rocco Moretti  > wrote:
>
>> Greg can correct me if I'm wrong(1), but in RDKit there's actually three
>> "levels" of hydrogens:
>>
>> * "Physical" hydrogens, which are represented as actual, independent
>> atoms in the atom graph. ("Physical hydrogens" is what I'm calling them - I
>> don't know if RDKit has an official term for them.)
>>
>> * "Explicit" hydrogens, which are represented as a numeric annotation on
>> their attached heavy atom. (And *not* as a separate atom object.)
>>
>> * "Implicit" hydrogens, which aren't actually represented anywhere, but
>> are calculated from the standard valence of the heavy atom, and how many
>> are occupied by actual atoms and explicit hydrogens.
>>
>> Generally, except for some coordinate calculations, RDKit seems to be
>> built around working with molecules with explicit or implicit hydrogens.
>> This is why when you read in a molecule, RDKit normally removes any
>> physical hydrogens. (Note that for most file reading code there's a
>> removeHs parameter you can set to False to change this behavior, and read
>> explicitly listed hydrogens as physical hydrogens.)
>>
>> By default "removing hydrogens" means turning them into implicit
>> hydrogens(2), but the RemoveHs() function has an "updateExplicitCount"
>> parameter which will cause the removed hydrogens to be turned into explicit
>> hydrogens instead. The standard MOL file loading code doesn't use this
>> option, though, so the hydrogens in the molecule are usually converted into
>> implicit when you read things in.
>>
>> AddHs(), of course, turns explicit and implicit hydrogens into physical
>> hydrogens. (Though the "explicitOnly" parameter can be used to control
>> this.) It does annotate whether these physical hydrogens came from either
>> the implicit or explicit pool, so you can round trip things through AddHs()
>> and RemoveHs() appropriately. (There's also a "implicitOnly" parameter on
>> RemoveHs() which will only remove those hydrogens.)
>>
>> Regards,
>> -Rocco
>>
>> (1) I don't think the RDKit hydrogen model has ever been formalized in
>> one place for user-facing documentation, so this is the understanding I've
>> gotten from banging my head against various hydrogen-related issues.
>>
>> (2) There's special complications here that there are certain structures,
>> such as imidazole, which needs physical or explicit hydrogens on one of the
>> nitrogens in order to Kekulize properly. If you're implicit only, the RDKit
>> sanitizer will choke. Thus, there's special casing in various Add/RemoveHs
>> function to avoid implicit-izing these critical hydrogens.
>>
>> On Thu, Sep 8, 2016 at 1:46 PM, Dimitri Maziuk > > wrote:
>>
>>> On 09/08/2016 10:25 AM, Greg Landrum wrote:
>>> ...
>>> > Why do you want 2D drawings that include H atoms?
>>>
>>> On the subject of H atoms: when I read in the MOL file that has them, I
>>> need to explicitly call AddHs() in order to have them drawn.
>>>
>>> Question: do they actually get stripped off by the reader and re-added
>>> by AddHs()? Or are they there "hidden" somehow and AddHs() just
>>> "unhides" them?
>>>
>>> TIA
>>> --
>>> Dimitri Maziuk
>>> Programmer/sysadmin
>>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>>>
>>>
>>> 
>>> --
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> 
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>>

Re: [Rdkit-discuss] rdMolDraw2D drawing code

2016-09-05 Thread David Cosgrove

HI Dmitri,

Sorry for the delay replying. I assume that by 'add padding' you mean that
the code that generates the 2D drawing coordinates should take account of
the size of the labels and would, from your example, maybe open out the
C9-O23-C14 bond a bit so that O21 and O24 are further apart?  If so, then I
can duck that one cheerfully as it's not part of the MolDraw2D code. The 2D
coordinates are generated in RDDepictor - MolDraw2D just uses the results
from that.  Having just had a peer at $RDBASE/GraphMol/Depictor/Depictor.h,
it would seem as though adding a padding as you suggest might not be
straightforward, but I guess it might be possible to alter the distance
matrix that is being embedded in 2D to take account of atom label sizes.
The joy of open source projects is of course that you have the opportunity
to change things if you don't like how they're done at present.  Maybe
something to think about at the UGM hackathon day?

As you say, your best bet in the short term is probably to adjust the font
size in the drawing.  MolDraw2D.h says that the font sizes are given "in
molecule coordinate units. That's probably Angstrom for RDKit" which in
reality means it's relative to a C-C bond in benzene of 1.5 units. This is
then changed internally to a value appropriate for the drawing engine that
is being used in a particular instance.  If you can see a sensible place to
put this information in the documentation, feel free to send a changed
version to Greg for inclusion in the next release.  I am struggling to find
any such documentation myself, and maybe that was your point ;-).

Cheers,
Dave

On Fri, Sep 2, 2016 at 9:07 PM, Dimitri Maziuk 
wrote:

> Hi all,
>
> I finally got a round tuit for playing with the drawing code and I like
> it -- great job, thank you Greg and Dave and everyone who contributed.
>
> One question though: is it possible to add padding around atom labels?
> Or use some other trick to make the attached look less crowded? (Yes, I
> do want all Hs and all atom labels with numbers.)
>
> The best I can come up with is reduce the font size a little, that works
> fine. I think it'd be nice if the fine manual for MolDraw2D said what
> the units used by FontSize()/SetFontSize() are.
>
> So, any better ideas than just slightly smaller labels?
>
> TIA
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Some feedback from the Sheffield Cheminformatics Conference

2016-07-07 Thread David Cosgrove

I think I can beat that. When I was working on the 2D drawing code with
Greg a couple of years ago, he sent me an email at about 6 on Christmas
morning!  Needless to say, he didn't get a reply until a few days  later as
I fear I am not so dedicated.

Cheers,
Dave


On Thursday, 7 July 2016, Markus Sitzmann  wrote:

> Well, first thing I saw on the lock screen of my alarm clock-ringing iPad
> the morning after a long night at the Sheffield conference dinner was a
> reply by Greg on this list sent at 6:48am (it even contained some code).
>
> Thanks a lot for your dedication and for building RDKit and its community,
> Greg.
>
> Cheers,
> Markus
>
> -
> |  Markus Sitzmann
> |  markus.sitzm...@gmail.com
> 
>
> On 07.07.2016, at 08:20, Greg Landrum  > wrote:
>
> Dear all,
>
> I was at the Sheffield Cheminformatics conference earlier this week (along
> with several people from this list) and I was really struck by the number
> of talks and posters that are using the RDKit. By my rough count the RDKit
> was used for about 1/3 of the talks and a similar fraction of the posters.
>
> This of course, makes me smile rather broadly (Christian, Nadine, and
> Sereina had to suffer through this while we were waiting at the airport ;-)
> ) but a big part of the reason for this success is the engagement and
> activity of the RDKit community. So I figured I'd share so that those of
> you who weren't in Sheffield also get the chance to grin about it.
>
> We're having an impact... that's really cool. Thanks! and congrats! :-)
>
> -greg
>
>
> --
> Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> 
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Getting to grips with Open3DAlign

2016-06-22 Thread David Cosgrove

Hi Sereina,

I beg to differ on the advisability of minimisation, even after using the
parameters you suggest to generate the conformation. I've recently been
using the CCDC's excellent Python API to analyse the results of the
generated conformations. This lets you very quickly assess whether any of
the bond distances or angles in a molecule is "unusual" with reference to
the data in the CSD. I've not done a systematic examination but my
anecdotal result is that you get far fewer such unusual geometries if you
give it a quick minimisation afterwards. Using the default optimiser
settings is quick, painless and, in my limited number of tests, worth the
effort.

I'd be interested to know if anyone else has different experience.

Cheers,
Dave


On Wednesday, 22 June 2016, Sereina  wrote:

> Based on the code snippets, Paolo has not used the basic-knowledge terms
> whereas Tim did.
>
> When setting useExpTorsionAnglePrefs=True and useBasicKnowledge=True, a
> minimization is in principle not necessary anymore (unless there are
> aliphatic rings, because we currently don’t have torsion rules for them).
>
> Best,
> Sereina
>
>
> On 22 Jun 2016, at 05:02, Greg Landrum  > wrote:
>
> I don't have anything to add to the pieces about the alignment (Paolo is
> the expert there!), but one comment on the conformation generation: If you
> used the background knowledge terms in the embedding, I don't think you
> should be getting the really distorted aromatic rings. Since in this case
> that does happen, for at least some conformations, I suspect there may be
> something wrong in the code.
>
> I'll take a look at that (and ask Sereina too).
>
> Best,
> -greg
>
>
> On Tue, Jun 21, 2016 at 10:30 PM, Paolo Tosco  > wrote:
>
>> Dear Tim,
>>
>> the Align() method returns an RMSD value, which however is computed only
>> on a limited number of atom pairs, namely those that the algorithm was able
>> to match between the two molecules, so a low value is not particularly
>> informative of the overall goodness of the alignment, as it only indicates
>> that the matched atoms were matched nicely, but there might only be a few
>> of those in the core, while side chains are scattered all over.
>> The Score() method instead returns the O3AScore for the alignment, which
>> is a better way to assess the quality of the superimposition.
>>
>> The other problem in your script is that the i index is incremented
>> before recording it in the lowest/highest variables, so the confIds are
>> shifted by 1, as the conformation index in the RDKit is 0-based.
>>
>> I also noticed that without minimizing the conformations the aromatic
>> rings look quite distorted, so I added a MMFF minimization, and I increased
>> the number of generated conformations and the pruneRmsThreshold. Setting to
>> False the experimental torsion angle preferences and basic knowledge about
>> rings seems to yield a larger variety of geometries which helps reproducing
>> this quite peculiar x-ray geometry which is probably not so commonly found.
>> Please find the modified script below.
>>
>> Hope this helps, kind regards
>> Paolo
>>
>>
>> #!/usr/bin/env python
>>
>>
>> from rdkit import Chem, RDConfig
>> from rdkit.Chem import AllChem, rdMolAlign
>>
>> ref = Chem.MolFromSmiles('NC(=[NH2+])c1ccc(C[C@
>> @H](NC(=O)CNS(=O)(=O)c2ccc3c3c2)C(=O)N2C2)cc1')
>> mol1 =
>> Chem.MolFromPDBFile(RDConfig.RDBaseDir+'/rdkit/Chem/test_data/1DWD_ligand.pdb')
>> mol1 = AllChem.AssignBondOrdersFromTemplate(ref, mol1)
>> mol2 =
>> Chem.MolFromPDBFile(RDConfig.RDBaseDir+'/rdkit/Chem/test_data/1PPC_ligand.pdb')
>> mol2 = AllChem.AssignBondOrdersFromTemplate(ref, mol2)
>>
>> pyO3A = rdMolAlign.GetO3A(mol1, mol2)
>> rmsd = pyO3A.Align()
>> score = pyO3A.Score()
>> print "Orig",score
>> Chem.MolToMolFile(mol1, "orig.mol")
>>
>> cids = AllChem.EmbedMultipleConfs(mol1, numConfs=250, maxAttempts=100,
>> pruneRmsThresh=0.5, useExpTorsionAnglePrefs=False,
>> useBasicKnowledge=False)
>> AllChem.MMFFOptimizeMoleculeConfs(mol1, mmffVariant='MMFF94s')
>> pyO3As = rdMolAlign.GetO3AForProbeConfs(mol1, mol2, numThreads=0)
>> i = 0
>> lowest = 9.9
>> highest = 0.0
>> for pyO3A in pyO3As:
>> rmsd = pyO3A.Align()
>> score = pyO3A.Score()
>> if score < lowest:
>> lowest = score
>> lowestConfId = i
>> if score > highest:
>> highest = score
>> highestConfId = i
>> i +=1
>>
>> print "Lowest:", lowest, lowestConfId
>> print "Highest:", highest, highestConfId
>>
>> Chem.MolToMolFile(mol1, "lowest.mol", confId=lowestConfId)
>> Chem.MolToMolFile(mol1, "highest.mol", confId=highestConfId)
>>
>>
>> On 06/21/16 15:41, Tim Dudgeon wrote:
>>
>> Hi All,
>>
>> I'm trying to get to grips with using Open3D Align in RDKit, but hitting
>> problems.
>>
>> My approach is to

[Rdkit-discuss] Counting H Atoms

2016-06-21 Thread David Cosgrove

Hi All,

I'm a bit confused about counting hydrogen atoms.  It's a perennial problem
with cheminformatics toolkits in my experience, but this seems particularly
perverse.  If I run the code:

from rdkit import Chem
from rdkit.Chem import AllChem

mol = Chem.MolFromSmiles( 'CCO' )
mol = Chem.AddHs( mol )
cids = AllChem.EmbedMultipleConfs(mol , useExpTorsionAnglePrefs=True ,
useBasicKnowledge=True , numConfs=1 )
AllChem.MMFFOptimizeMoleculeConfs( mol , mmffVariant='MMFF94s' )

for a in mol.GetAtoms() :
num_heavy = a.GetTotalDegree() - a.GetTotalNumHs()
print '%d : %d num_heavy = %d num_H = %d %d %d' % ( a.GetIdx() ,
a.GetAtomicNum() , num_heavy , a.GetTotalNumHs() , a.GetNumImplicitHs() ,
a.GetNumExplicitHs() )
for nb in a.GetNeighbors() :
print '  %d : %d' % ( nb.GetIdx() , nb.GetAtomicNum() )

I get the output

0 : 6 num_heavy = 4 num_H = 0 0 0
  1 : 6
  3 : 1
  4 : 1
  5 : 1
1 : 6 num_heavy = 4 num_H = 0 0 0
  0 : 6
  2 : 8
  6 : 1
  7 : 1
2 : 8 num_heavy = 2 num_H = 0 0 0
  1 : 6
  8 : 1
3 : 1 num_heavy = 1 num_H = 0 0 0
  0 : 6
4 : 1 num_heavy = 1 num_H = 0 0 0
  0 : 6
5 : 1 num_heavy = 1 num_H = 0 0 0
  0 : 6
6 : 1 num_heavy = 1 num_H = 0 0 0
  1 : 6
7 : 1 num_heavy = 1 num_H = 0 0 0
  1 : 6
8 : 1 num_heavy = 1 num_H = 0 0 0
  2 : 8


It seems that in a 3D mol, after embedding and minimisation, an H atom is
just like any other atom, and is ignored in the various H atom counting
functions.  Is that expected behaviour?  Depending on the answer, either
the documentation or the behaviour is incorrect.

Cheers,
Dave
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] getting substructure for Morgan fingerprint bit

2016-03-05 Thread David Cosgrove

He might want the hydrogen counts specified to block unwanted substitutions?

Dave


> On 6 Mar 2016, at 03:38, Peter S. Shenkin  wrote:
> 
> Just curious here
> 
> Since every SMILES is a valid SMARTS,
> How do you want the SMARTS to differ from the SMARTS the SMILES already is?
> What would be the advantage if you could do so?
> Thanks,
> -P.
> 
>> On Sat, Mar 5, 2016 at 8:40 PM, Naeem Attari  wrote:
>> Hi,
>> 
>> I was wondering if there is any way to get the substructure for the bits of 
>> Morgan fingerprint as follows
>> 
>> [dx.doi.org/10.1021/ci2001583 |J. Chem. Inf. Model. 2011, 51, 1447–1456]
>> 
>> though I am able to get the smiles for the bits by Chem.MolFragmentToSmiles, 
>> i think it would be more informative/specific to have smart for the bit
>> 
>> 
>> Kind Regards
>> Shaikh Naeem Attari
>> Ph.D. Candidate, Department of Pharmacoinformatics
>> National Institute of Pharmaceutical Education and Research (NIPER)
>> S.A.S. Nagar, India. +91 7814727792
>> in.linkedin.com/in/naeemraza25/
>> 
>> 
>> --
>> 
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] stereochemistry of S with degree 3

2016-02-10 Thread David Cosgrove

Hi Andrew,

As chiralities go, this one has turned out to be quite valuable!

https://en.wikipedia.org/wiki/Esomeprazole

Dave


On Mon, Feb 8, 2016 at 3:05 PM, Andrew Dalke 
wrote:

> Hi!
>
>   Could someone explain to this non-chemist what the chirality means in
> the following?
>
>   CN[S@@](=O)C1=CC=CC=C1
>
> It comes from PubChem id 12194260 at
> https://pubchem.ncbi.nlm.nih.gov/compound/12194260 .
>
> Isn't this a symmetric structure, which can't have an orientation at that
> point? Even if it can have a chirality, which sort of chirality is it? The
> list at http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
> says:
>
>   Tetrahedral is the default class for degree four
>   Allene-like is the default class for degree 2
>   Square-planar is another class for degree four
>   Trigonal-bipyramidal is the default class for degree five
>   Octahedral is the default class for degree six
>
> but says nothing about degree three. And RDKit agrees that this is degree
> 3:
>
>   >>> mol = Chem.MolFromSmiles("CN[S@@](=O)C1=CC=CC=C1")
>   >>> mol.GetAtomWithIdx(2).GetDegree()
>   3
>
>
> This came up while testing my algorithm to fragment a structure. I expect
> to get the same structure back.
>
> I started with:
>
>   >>> Chem.CanonSmiles("CN[S@@](=O)C1=CC=CC=C1")
>   'CN[S@@](=O)c1c1'
>
> The fragmentation produces:
>
>   >>> Chem.CanonSmiles("C*.N*.*[S@@](=O)C1=CC=CC=C1")
>   '[*]C.[*]N.[*][S@@](=O)c1c1'
>
> I can manipulate this at the SMILES level to insert two closures and
> produce the reconnect-able SMILES
>
> C2.N23.[S@@]3(=O)C1=CC=CC=C1
>  ^--^
>  ^--^
>
> When I process that, I get a flipped chirality:
>
>   >>> Chem.CanonSmiles("C2.N23.[S@@]3(=O)C1=CC=CC=C1")
>   'CN[S@](=O)c1c1'
>
> I did not expect this. The Daylight SMILES spec says:
>
>   The chiral order of the ring closure bond is implied by the
>   lexical order that the ring closure digit appears on the
>   chiral atom (not in the lexical order of the "substituent" atom).
>
> I expected the '*' of '*[S**]' and the '3' of '[S**]3' to have the same
> bond position so give the same chirality.
>
> Finally, if I replace the '[S@@]' with a '[C@@]' or '[P@@]' I lose the
> chirality:
>
>   >>> Chem.CanonSmiles("CN[C@@](=O)C1=CC=CC=C1")
>   'CNC(=O)c1c1'
>   >>> Chem.CanonSmiles("CN[P@@](=O)C1=CC=CC=C1")
>   'CN[P](=O)c1c1'
>
>
> At this point I can't tell if there's a problem with how I understand
> stereochemistry, how I understand SMILES, or how I understand RDKit.
>
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] how to replace a bond and preserve chirality

2016-02-04 Thread David Cosgrove

HI Andrew,

I don't have a solution for RDKit, because I don't know if you can do this
sort of thing.  But when I've tackled this in OEChem, I've changed the
atomic number of the substituent atom to something else (I always use Xe,
because I know that will never be anything other than a marker atom, but
you might change it to 0 for *) and then trim the atoms off that one (other
than the parent chiral atom).  That way, you never break the bond between
the core and the substituent and the chirality is preserved - it might
change from R to S, because that depends on the atomic numbers of the atoms
on the chiral atom (CPK rules) but the relative orders should remain the
same.

Hope that helps,

Dave


On Wed, Feb 3, 2016 at 4:48 AM, Andrew Dalke 
wrote:

> I'm working on a project where I cut a molecule along certain single
> bonds, to find a core structure and one or more R-groups.
>
> In yesterday's email, I mentioned a problem I have in creating a canonical
> SMILES for the core when the R-groups are replaced by a hydrogen.
>
> I also want to create a SMILES for the core where the R-groups are
> replaced with a "-[*]". For example, given "c1c1CCO" where the R-group
> is the "CCO", I want to create the core "c1c1[*]".
>
> My code works except when the attachment point of the cut contains chiral
> information. About 50% of the time, the chirality is inverted. The problem
> is that chirality is a very fragile property which depends (I think) on the
> index permutation order of the neighboring atoms to a given atom. It's
> tricky to maintain that information during structure editing.
>
> For the rest of this email I will present a reproducible, explain my
> understanding, and sketch what seems a reasonable workaround. Let me know
> if I am (in)correct in that. I'll then propose what I think may be a useful
> editing function: ReplaceBond()
>
> The following code snippet works like this. It starts with the input
> structure:
>
>C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@
> ]1(CO)O)C)O)F)C
>
> I cut the bond from the "[C@@]4" to the final "C", then add a "[*] to
> each side of the cut. This produces the canonical core:
>
>[*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
> To verify if this is correct, I'll make some syntax changes directly on
> the original SMILES string. I replace the final "C" character with "[*]".
> This should be isomorphic to the expected core. I parse and canonicalize it
> to get the expected core SMILES, which is:
>
>[*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
> Note that the first has a '@@' in the second atom while the second has a
> '@'.
>
> == Reproducible which shows that the chirality changes 
>
> from __future__ import print_function
> from rdkit import Chem
>
> # I want to cut the SMILES at the given atoms, to make a core and an
> R-group.
> # I want the location of the attachment points to be marked [*] atoms.
>
> #  1 1 1  1 11 1  1
> 11 2 2 2 2 2
> # 0 1 2 3 4 567  890 1 2  3 45 6  7
> 89 0 1 2 3 4
> smiles = "C[C@@H]1C[C@H]2[C@@H]3CCC4=CCC=C[C@@]4([C@]3([C@H](C[C@@]2([C@
> ]1(CO)O)C)O)F)C"
> #will cut the bond between 12 and 24
>  ^^^^
> core_atom, rgroup_atom = 12, 24
>
> input_mol = Chem.MolFromSmiles(smiles)
> assert rgroup_atom == input_mol.GetNumAtoms()-1, "this needs to be the
> terminal 'C'"
>
> # Cut the bond and add two "[*]" atoms
> emol = Chem.EditableMol(input_mol)
> emol.RemoveBond(core_atom, rgroup_atom)
>
> for atom in (core_atom, rgroup_atom):
> new_atom = emol.AddAtom(Chem.Atom(0))
> new_bond = emol.AddBond(atom, new_atom, Chem.BondType.SINGLE)
> print("added bond", new_bond, "from", atom, "to", new_atom)
>
> cut_mol = emol.GetMol()
> cut_smiles = Chem.MolToSmiles(cut_mol, isomericSmiles=True)
>
> core_smiles = max(cut_smiles.split("."), key=len)
> print("found core SMILES:\n", core_smiles)
> # This prints:
> #   [*][C@@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
> # Compare it to what I expected.
> # In this case I can simply replace the terminal methyl in the SMILES.
> noncanonical_expected_core_smiles = smiles[:-1] + '[*]'
>
> expected_mol = Chem.MolFromSmiles(noncanonical_expected_core_smiles)
> expected_core_smiles = Chem.MolToSmiles(expected_mol, isomericSmiles=True)
> print("expected core SMILES:\n", expected_core_smiles)
> # This prints:
> #   [*][C@]12C=CCC=C1CC[C@H]1[C@@H]3C[C@@H](C)[C@](O)(CO)[C@@]3(C)C[C@H
> ](O)[C@@]12F
>
>
> if core_smiles != expected_core_smiles:
> print(" Not the same !!!")
> else:
> print("Identical")
> raise AssertionError("did not expect that")
>
> ==
>
> Here is what I think is happening.
>
> The problem occurs because the atom's chiral tag is connected to the
> permutation

[Rdkit-discuss] Latest version

2015-12-10 Thread David Cosgrove

Hi All,

I'm sorry to trouble you all with this one, as I feel I should be able to
do better.  I'm trying to install the latest version, 2015.09.1, but I
can't find it on sourceforge.  The latest one I can find there is
2015.03.1.  I've managed to get the ubuntu installation installed via
apt-get, and my python interpreter can find it. However, I can't find the
include files or object libraries for C++ development, which is what I'm
after.

Can someone please point me in the right direction?  Be as rude as you like
in the process, as I feel I must be being very dim!

Thanks,

Dave
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Latest version

2015-12-10 Thread David Cosgrove

Hi All,

Many thanks for the speedy replies.  I obviously haven't been paying close
enough attention to the project, for which I apologise.  Would it be
possible to update the installation docs at www.rdkit.org/docs/Install.html
which is what google serves up when you ask for 'rdkit installation'?

Thanks,
Dave

On Thu, Dec 10, 2015 at 10:21 AM, Gianluca Sforna <gia...@gmail.com> wrote:

> On Thu, Dec 10, 2015 at 11:12 AM, David Cosgrove
> <davidacosgrov...@gmail.com> wrote:
> > I've
> > managed to get the ubuntu installation installed via apt-get, and my
> python
> > interpreter can find it. However, I can't find the include files or
> object
> > libraries for C++ development, which is what I'm after.
>
> In this case, you probably want to install the librdkit-dev package
>
>
> --
> Gianluca Sforna
>
> http://morefedora.blogspot.com
> http://plus.google.com/+gianlucasforna - http://twitter.com/giallu
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Python GetShortestPath()?

2015-04-22 Thread David Cosgrove

Just for information, if you want the full matrix of shortest path
distances for a molecule, try the Floyd-Warshall algorithm:
http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.  It's
O(n^3), and about 10 lines of code.  For molecules, you initialise the
input matrix so that dist[i][j] = 1 if i and j are bonded, something large
( natoms will do) if they're not.  The Wikipedia page also gives an
efficient algorithm for creating the shortest paths between arbitrary atoms
based on the output distance matrix.

Dave


On Wed, Apr 22, 2015 at 7:23 AM, James Davidson j.david...@vernalis.com
wrote:

 Hi Greg,

 I just built the latest revision - and the functionality is exposed -
 thanks (and, of course, thanks Paolo!).

 Kind regards

 James

 __
 PLEASE READ: This email is confidential and may be privileged. It is
 intended for the named addressee(s) only and access to it by anyone else is
 unauthorised. If you are not an addressee, any disclosure or copying of the
 contents of this email or any action taken (or not taken) in reliance on it
 is unauthorised and may be unlawful. If you have received this email in
 error, please notify the sender or postmas...@vernalis.com. Email is not
 a secure method of communication and the Company cannot accept
 responsibility for the accuracy or completeness of this message or any
 attachment(s). Please check this email for virus infection for which the
 Company accepts no responsibility. If verification of this email is sought
 then please request a hard copy. Unless otherwise stated, any views or
 opinions presented are solely those of the author and do not represent
 those of the Company.

 The Vernalis Group of Companies
 100 Berkshire Place
 Wharfedale Road
 Winnersh, Berkshire
 RG41 5RD, England
 Tel: +44 (0)118 938 

 To access trading company registration and address details, please go to
 the Vernalis website at www.vernalis.com and click on the Company
 address and registration details link at the bottom of the page..
 __

 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] After successfull installation I face still errors with importing in python.

2014-07-30 Thread David Cosgrove

Hi Jessica,

There's no denying that linux is a steep learning curve, although I expect
that I would face similar problems were I to start working in earnest on
windows.

The problem here, I would expect, is that in a new shell your environment
variables are not set. You should add the lines

export RDBASE=opt/RDKit_2014_03_1/
export
LD_LIBRARY_PATH=opt/RDKit_2014_03_1/build/lib/:usr/local/src/boost_1_55_0/libs/
export PYTHONPATH=opt/RDKit_2014_03_1/

that you typed as part of the installation system, into your .bashrc so
that every time you start a new shell/terminal or login again they are set
afresh. Incidentally, I am surprised that these aren't

export RDBASE=/opt/RDKit_2014_03_1/
export
LD_LIBRARY_PATH=/opt/RDKit_2014_03_1/build/lib/:usr/local/src/boost_1_55_0/libs/
export PYTHONPATH=/opt/RDKit_2014_03_1/

i.e. with a '=/opt' rather than '=opt' since I assume you installed RDKit
into /opt.

You might at some point want to install the 'environment-modules' package
http://modules.sourceforge.net/ which makes this sort of thing much easier,
and allows convenient switching between different versions of the same
project.  This is not a standard package on ubuntu, apparently.

Regards,
Dave



On Wed, Jul 30, 2014 at 4:02 PM, Jessica Krause jessica.kra...@tu-bs.de
wrote:

  Dear all,

 after succeeding some minutes before with installing RDKit I have now
 trouble again.

 I have a situation where I need to reinstall RdKit (build from source
 http://www.blopig.com/blog/2013/02/how-to-install-rdkit-on-ubuntu-12-04/)
 in order to work with the python module.

 I get following errors when I start with python in a new terminal.

  from rdkit import Chem
 Traceback (most recent call last):
   File stdin, line 1, in module
 ImportError: No module named rdkit

 Note:

 the error doesn't occur when I build from source again and work in the
 same terminal. Does it mean that I have to build from source and ctest
 everytime when I start a terminal or the PC?

 Thanks in advance.

 Best Regards,

 Jessica Krause.











 Dear Jean-Paul,

 I would like to thank you for your help. Now I have installed the
 RDKit_2014_03_1 on Ubuntu 14.04 with the link you mentioned above. Thanks a
 lot! It saved me time with working more hours on installing RDKit on
 Ubuntu. I have to mention that I reinstalled Ubuntu again to avoid the
 errors I got all the time before. I am new to working with Ubuntu and RDKit.
 Thanks again!

 Jessica


 On 26.07.2014 17:44, JP wrote:

  Not a direct solution to your problem, but have you tried the Ubuntu
 specific instructions at:
 http://www.blopig.com/blog/2013/02/how-to-install-rdkit-on-ubuntu-12-04/

  I have installed it successfully on 14.04.


 -
 Jean-Paul Ebejer
 Early Stage Researcher


 On 24 July 2014 15:40, Jessica Krause jessica.kra...@tu-bs.de wrote:

 Dear all,

 I tried to install RDKit 2014 on Ubuntu 14.04 but I did
 not succeed!


 While executing the make command in the RDKit_2014_03_1/build directory,
 I recieved the following error:

 [  0%] Built target inchi_support
 [  1%] Built target RDGeneral
 [  3%] Built target RDGeneral_static
 [  3%] Built target testDict
 Linking CXX shared library ../../lib/libRDBoost.so
 /usr/bin/ld: /usr/local/lib/libpython2.7.a(exceptions.o): relocation
 R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared
 object; recompile with -fPIC
 /usr/local/lib/libpython2.7.a: error adding symbols: Bad value
 collect2: error: ld returned 1 exit status
 make[2]: *** [lib/libRDBoost.so.1.2014.03.1] Error 1
 make[1]: *** [Code/RDBoost/CMakeFiles/RDBoost.dir/all] Error 2
 make: *** [all] Error 2




 the environmental variables that I have used are:

 export RDBASE=opt/RDKit_2014_03_1/
 export
 LD_LIBRARY_PATH=opt/RDKit_2014_03_1/build/lib/:usr/local/src/boost_1_55_0/libs/
 export PYTHONPATH=opt/RDKit_2014_03_1/


 Please help me with this problem.

 Thanks in advance.

 Regards,
 Jessica Krause



 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





 --
 Infragistics Professional
 Build stunning WinForms apps today!
 Reboot your WinForms applications with our WinForms controls.
 Build a bridge from your legacy apps to the future.

 http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net

Re: [Rdkit-discuss] MaxMin Picker and Python

2014-07-17 Thread David Cosgrove

If you don't mind writing some extra code, we've had good success with a
Monte Carlo implementation of a maximin diversity picker called BigPicker,
described in Blomberg et al, JCAMD, 23, 513-525 (2009). With this
implementation, you only need to keep the subset distance matrix in memory.
At each step, one of the 2 molecules involved in the shortest subset
distance is swapped out, and a randomly chosen molecule from the pool
replaces it. The relevant row/column of the subset distance matrix is
updated and the new minimum interdistance found. A Monte Carlo criterion
is used to decide whether to accept the swap or not. As the name suggests,
it can be used on very large datasets. Indeed, in our implementation we
allowed for the case where the subset was too large for the subset distance
matrix to be held in memory and the minimum distance was calculated from
fingerprints on the the fly at each step. That really was slow, but if
it's the only way of solving the problem... It's worth recognising that
this sort of algorithm spends a lot of time mucking about improving the
interdistance in the 4th or 5th decimal place. It's not clear that a subset
with a minimum interdistance of 0.41567 is definitively better than one of
0.41568, so a fairly loose convergence criterion is usually ok. In our
experience a larger number of shorter runs, to avoid convergence on a bad
local minimum, is more reliable.

Having said all that, I'd be inclined to agree with Greg that if you're
only picking 200 compounds from 26000 you're probably going to do just as
well with a pin. You could be slightly cleverer by only accepting the next
random selection if it's above a threshold distance from anything you've
already selected to avoid the pathological case he describes.

Dave

On Thu, Jul 17, 2014 at 5:19 AM, Greg Landrum greg.land...@gmail.com
wrote:

one other short thing.
If this is the code you are using for the distance matrix:

On Thu, Jul 17, 2014 at 12:18 AM, Matthew Lardy mla...@gmail.com wrote:

dm=[]
for i,fp in enumerate(zims_fps[:26000]): # only 1000 in the demo (in
the interest of time)

dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True))
dm = array(dm)

Then at least part of the problem is that you are generating the full
matrix. I think you intend to have:

dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[i+1:26000],returnDistance=True))
in there.

That typo was in the original notebook that you used; I'm going to have to
figure out how to fix that.

-greg

--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] SmilesToMol runtime error

2012-08-03 Thread DAvid Cosgrove

Hi,



It’s a minor point in the discussion, maybe, but Meyers’ ‘More Effective
C++’ recommends always catching exceptions by reference.  It’s quicker than
catching by value because it avoids copy construction, and, more
importantly, it avoids the problem of class slicing if a derived class
exception object is caught as a base exception.



Paul’s code snippet becomes



try {
   rwmol = SmilesToMol(smiles_strings[i]);
   // cool stuff here...
}
catch (RDKit::MolSanitizeException msg) {
// do something (or nothing)...
   std::cout  msg.what()  std::endl;
}

If you were to write



try {
   rwmol = SmilesToMol(smiles_strings[i]);
   // cool stuff here...
}
catch ( std::exception msg) {
// do something (or nothing)...
   std::cout  msg.what()  std::endl;
}

you would get std::exception::what() not RDKit::MolSanitizeException::what()
because copying the RDKit exception to the std one has sliced away the
extra.  Passing by reference retains the virtual call to the expected
RDKitobject. You should also note that whatever is pointed to by rwmol
is
undefined.  Throwing an exception is not just another way of passing a
return value. After the try/catch block, it will be as if SmilesToMol was
never called in the first place.  Also, Meyers recommends only using try
blocks when your really need to, as they add an overhead to both code size
and runtime.  If you want your code to be as fast as possible, and the
thing just blowing up with a strange looking error message about exceptions
is an adequate error message for you, then don't bother.


Dave
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

92 matches

Mail list logo