[Rdkit-discuss] mmpdb 3.1

2024-01-10 Thread Andrew Dalke
Hi everyone, We have released mmpdb 3.1, which you can get from https://github.com/rdkit/mmpdb . mmpdb 3.0, released May 2023, merged three development tracks: - create and query 1-cut med chem transformations as described in Awale et al., The Playbooks of Medicinal Chemistry Design Moves,

Re: [Rdkit-discuss] SDMolSupplier warning 2023.9.2

2023-12-12 Thread Andrew Dalke
Hi Mandar, > On Dec 13, 2023, at 03:39, Mandar Kulkarni > wrote: > I could not figure out how Rdkit is guessing it as 2D structure, as there is > no such information in SDF. Line 2 of the SDF record looks something like: RDKit 2D This line has the format (quoting from the

Re: [Rdkit-discuss] mol properties in SDWriter

2023-09-29 Thread Andrew Dalke
On Sep 26, 2023, at 01:17, Ling Chan wrote: > >(1) > 4.099 .. > Just wonder what was the rationale behind this extra "(1)" on the property > field lines (pKa and logP in the above example)? > > And is there a way to get rid of these? I am not sure if this extra "(1)" is > part of

Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-16 Thread Andrew Dalke
On Jun 16, 2023, at 03:15, S Joshua Swamidass wrote: > In graph theory, a planar graph is a graph that can be embedded in the plane, > i.e., it can be drawn on the plane in such a way that its edges intersect > only at their endpoints. In other words, it can be drawn in such a way that > no

Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Andrew Dalke
On Jun 15, 2023, at 20:49, S Joshua Swamidass wrote: > > And what (generally speaking) is the algorithm used by rdkit? Do we know it's > complexity? https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00543 "Get Your Atoms in Order—An Open-Source Implementation of a Novel and Robust Molecular

Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Andrew Dalke
On Jun 15, 2023, at 18:20, S Joshua Swamidass wrote: > It's well known that the graph-isomorphism problem is NP While P is contained in NP, I don't think that's the NP you mean. I suspect you may be thinking of subgraph isomorphism, which is NP-hard. Graph isomorphism may be quasi-polynomial

[Rdkit-discuss] ANN: chemfp 4.1

2023-05-17 Thread Andrew Dalke
Hi everyone, I've just released chemfp 4.1. To install the pre-compiled package for Linux-based OSes do: python -m pip install chemfp -i https://chemp.com/packages/ For a detailed description of what's new, see: https://chemfp.readthedocs.io/en/latest/whats_new_in_41.html As a summary,

Re: [Rdkit-discuss] Can a bond index be associated with order in explicit SMILES?

2023-05-17 Thread Andrew Dalke
On May 17, 2023, at 02:31, Vincent Scalfani wrote: > I thought that this might also be the case for bond indices, but that does > not appear to be correct (see example below). Is it possible to get a bond > index in the order of the SMILES? This may help you understand why that's a difficult

Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Andrew Dalke
On May 9, 2023, at 07:55, Haijun Feng wrote: > Can anyone help me figure out how to get each atom with H from the smiles as > above. Thanks so much! Try using Chem.MolFragmentToSmiles to get the SMILES for each atom, with all hydrogens explicit, then strip off the leading and trailing []s.

Re: [Rdkit-discuss] Permutation of multiple enumeration

2022-07-06 Thread Andrew Dalke
ildcards, like: [*:1]c1ncc([*:2])cn1 where [*:1] is the attachment point for R1, [*:2] is the attachment point for R3, and [*:3] is the attachment point for R3. The R-group SMILES must have a single unlabled "*" wildcard, like: CO* -or- C(*)CO The program is used like this: python enu

[Rdkit-discuss] chemfp 4.0

2022-07-04 Thread Andrew Dalke
Hi all, I've recently released chemfp 4.0, with support for several diversity selection algorithms, and an improved API for interactive use in a notebook environment. Chemfp is an analytics package for cheminformatics fingerprints. It contains command-line tools and an extensive Python

Re: [Rdkit-discuss] how to report SDF records for which Chem.ForwardSDMolSupplier returns None?

2022-04-14 Thread Andrew Dalke
On Apr 14, 2022, at 12:57, Ivan Tubert-Brohman wrote: > How about splitting the file on lines consisting of "", and then parsing > each record? If the parsing fails, you can write out the bad record for > future inspection. (This addresses the basic use case, but not the "even > better"

Re: [Rdkit-discuss] how to report SDF records for which Chem.ForwardSDMolSupplier returns None?

2022-04-14 Thread Andrew Dalke
On Apr 14, 2022, at 09:16, Gyro Funch wrote: > I don't know the sdf format well, so please excuse my ignorance, but instead > of a custom parser, would it be possible to write a preprocessor to eliminate > the offending information? Perhaps something using regular expressions in > python,

[Rdkit-discuss] mmpdb 3.0b1

2022-02-09 Thread Andrew Dalke
Hi all, The combination of crowd-funding and contract work for me, and methods + software development by Mahendra Awale, has resulted in a new version of mmpdb. More specifically, version 3.0 beta 1 is available on GitHub at: https://github.com/adalke/mmpdb/tree/v3-dev The CHANGELOG

Re: [Rdkit-discuss] generating smiles using RDKit

2021-12-08 Thread Andrew Dalke
Hi Gyro, > On Dec 8, 2021, at 11:02, Gyro Funch wrote: > > My work is in the area of toxicology and I am interested in generating SMILES > for molecules referred to as 'short chain chlorinated paraffins' (SCCP). > > A general definition that is sometimes used is that an SCCP is given by the

Re: [Rdkit-discuss] Reading text records from SDF from gzipped files

2021-11-04 Thread Andrew Dalke
Hi Tim, You might also consider using chemfp, which has this sort of functionality available through its toolkit wrapper API: from chemfp import rdkit_toolkit as T import itertools with T.read_ids_and_molecules("chembl_28.sdf.gz") as reader: loc = reader.location for id, mol in

Re: [Rdkit-discuss] MolToSmiles atom ordering

2021-11-02 Thread Andrew Dalke
Hi Ling, If there are symmetries then a substructure search like will only give you one mapping, and that might not be the canonical mapping. What you're looking for is the special property _smilesAtomOutputOrder >>> from rdkit import Chem >>> mol =

Re: [Rdkit-discuss] MolToSmiles

2021-10-21 Thread Andrew Dalke
> On Oct 21, 2021, at 04:50, Ling Chan wrote: > > I got the attached sdf. When I did a MolToSmiles, it gives me the following. > > >>> for m in Chem.SDMolSupplier("pdb_structures/1q6k_ligand.sdf"): > ... print (Chem.MolToSmiles(m)) > ... >

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-23 Thread Andrew Dalke
On Jul 23, 2021, at 06:42, Andrew Dalke wrote: > > No, there's no way to do that. > > The best I can suggest is to go back to the original Python implementation > and change the code leading up to Alternatively, since your template is small, you can brute-force enumerat

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Andrew Dalke
On Jul 23, 2021, at 01:01, Gustavo Seabra wrote: > I actually want the sulfone to be found, if it is there. My problem is that I > also want flexibility to change the ring atoms and still find the ring as a > match, while considering a match on the sulfone only if it really is there. > (e.g.,

Re: [Rdkit-discuss] Maximum Common Substructure using SMARTS

2021-07-22 Thread Andrew Dalke
Hi Gustavo, > template = > Chem.MolFromSmarts('[a]1(-[S](-*)(=[O])=[O]):[a]:[a]:[a]:[a]:[a]:1') Unless things have changed since I last looked at the algorithm, you can't meaningfully pass a SMARTS-based query molecule into the MCS program, outside of a few simple cases. It generates a

Re: [Rdkit-discuss] Shape Tanimoto distance question

2021-06-30 Thread Andrew Dalke
> On Jun 30, 2021, at 04:20, Francois Berenger wrote: > > On 29/06/2021 12:26, Greg Landrum wrote: >> Hi Leon, >> You can convert the tanimoto distance to similarity, but the formula >> is: >> Similarity = 1 - Distance > > In other words: > > Tanimoto_distance = 1.0 - Tanimoto_score As a

[Rdkit-discuss] off_coverage, Z3, and test set reduction

2021-06-08 Thread Andrew Dalke
Hi all, I'm excited about a tool I developed for the Open Force Field Initiative and thought to share a bit about it here. It's called "off_coverage", currently in a pull-request at https://github.com/openforcefield/cheminformatics-toolkit-equivalence and also available from my Sourcehut

Re: [Rdkit-discuss] Are the path-based fingerprints formally described in the scientific literature?

2021-05-21 Thread Andrew Dalke
On May 20, 2021, at 03:17, Francois Berenger wrote: > Weren't the path-based FPs formally described somewhere? What does "formally" mean? Daylight was rarely participated in the academic literature tradition. They instead preferred to publish their information directly, as Pat mentions: On

Re: [Rdkit-discuss] How to prevent a SMILES from starting with a specific atom?

2021-05-11 Thread Andrew Dalke
On May 12, 2021, at 05:08, Francois Berenger wrote: > Or, more generally, flag a given atom in a molecule > and ask rdkit to not start the corresponding SMILES with > this atom, any unflagged atom being fine. Perhaps do the opposite and use rootedAtAtom to have RDKit start with a specific atom

Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule

2021-04-02 Thread Andrew Dalke
Hi Ling, > On Apr 2, 2021, at 16:23, Ling Chan wrote: > > Thank you Francois, I took a look at your code and borrowed parts of it to > rejoin two molecules. It seems like my problem is solved. I eventually > arrived at something like example 4 in >

Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule

2021-04-02 Thread Andrew Dalke
On Mar 31, 2021, at 21:55, Ling Chan wrote: > I am trying to do something that I think is quite simple, but I have not > figured out a simple way. Don't know if I am missing something. I am sure > that ultimately I can figure it out, but I wonder if there is a good way. If you can work in

Re: [Rdkit-discuss] inter-classes Tanimoto similarity

2021-03-14 Thread Andrew Dalke
> On Mar 13, 2021, at 20:29, Marawan Hussien via Rdkit-discuss > wrote: > my question is if this is the valid approach of comparison, particularly if > the class sizes vary widely and the average similarity will be inevitably > affected by the size of each item in each pair. As a check, it

[Rdkit-discuss] chemfp 3.5.1 and ChEMBL 27 FPB distributions

2021-02-05 Thread Andrew Dalke
Hi all, I've just released chemfp 3.5.1 with support for "licensed FPB files". These are fingerprint datasets which can be used under the terms of chemfp's base license agreement even without a chemfp license key or source code distribution. As the first (and so far only) data set, I've

Re: [Rdkit-discuss] Partial substructure match?

2020-11-23 Thread Andrew Dalke
625 0.640 0.500 CHEBI:1895 9 9 9 8 0.409 0.333 0.409 0.320 ... Finally, nearly all of the MCS parameters can be configured on the command-line. This program was written by Andrew Dalke . """ import sys import argparse from rdkit import Chem from rdkit.Chem import rdF

Re: [Rdkit-discuss] Morgan FP atom numbering

2020-10-27 Thread Andrew Dalke
On Oct 26, 2020, at 17:41, Cyrus Maher wrote: > I’m wondering if there is an easy way to retrieve the atom numbers that the > morgan fingerprints algorithm assigns as its first step. Many of the fingerprint function support an optional "bitInfo" parameter. If it's a dictionary then the keys

Re: [Rdkit-discuss] sd file format question

2020-10-02 Thread Andrew Dalke
S strings before and after the conversion. Best regards, Andrew da...@dalkescientific.com # Copy charges from the "M CHG" data lines to the atom block # Written by Andrew Dalke, 2 October 2020 import argparse import sys import gzip # This requires

Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Andrew Dalke
On Sep 9, 2020, at 04:00, Lewis Martin wrote: > I'd like to keep it FOSS since its for academic publication and hopefully to > be re-used. Chemfp is amazing but brute-forcing 100million by 100million > would surely still take a long time compared with an approximate nearest > neighbor

Re: [Rdkit-discuss] Rdkit-discuss] MACCS keys - revisited

2020-09-08 Thread Andrew Dalke
On Sep 8, 2020, at 14:30, Mike Mazanetz wrote: > Does anyone know whether it’s possible to obtain not just a fingerprint keys > for MACCS (binary values) but the number of occurrences of the keys, > particularly these details: The SMARTS patterns for most of the MACCS keys is available by:

Re: [Rdkit-discuss] chemfp 1.6 and 3.4 releases

2020-06-25 Thread Andrew Dalke
On Jun 25, 2020, at 16:27, Andrew Dalke wrote: > > See https://chemfp.com/license/ for details, or to get started: > > python -m pip install chemfp -i https://chemp.com/packages/ --upgrade That should be python -m pip install chemfp -i https://chemfp.com/packages/ --u

[Rdkit-discuss] chemfp 1.6 and 3.4 releases

2020-06-25 Thread Andrew Dalke
Hi RDKit'ers, I've just released new versions of chemfp. Version 1.6 is the no-cost/open source version, and 3.4 is the commercial version. The goal of chemfp 1.6 is to provide a good performance baseline for evaluating new Tanimoto search programs. This release is about 10-20% faster than

Re: [Rdkit-discuss] Number of sp3 atoms

2020-05-31 Thread Andrew Dalke
On May 31, 2020, at 15:23, Chris Swain via Rdkit-discuss wrote: > I’d like to include the number of sp3 atoms, is there an easy way to do this? I don't easily see a function for that. There's rdMolDescriptors.CalcFractionCSP3() which "returns the fraction of C atoms that are SP3 hybridized".

Re: [Rdkit-discuss] SMILES/SMARTS codes that match multiple atoms

2020-02-08 Thread Andrew Dalke
On Feb 8, 2020, at 17:55, Janusz Petkowski wrote: > > If not how can I match cases where in a given position there can be C or H > with rdkit? I believe you should use #1 instead of H. >>> from rdkit import Chem >>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", "C(=O)OCCC"]]

[Rdkit-discuss] last call for mmpdb funding

2020-01-22 Thread Andrew Dalke
Hi all, This is the last email I'll send asking for people and organizations to join the current mmpdb crowdsourcing effort. I've discussed it several times before here. In summary, I'm looking for crowdfunding for the matched molecular pair program 'mmpdb'. This is part of a test to find

Re: [Rdkit-discuss] acepentalene aromaticity perception

2020-01-22 Thread Andrew Dalke
On Jan 22, 2020, at 14:12, Greg Landrum wrote: > As an aside: it's not particularly relevant to this discussion, but I don't > understand why the wikipedia page says that the compound is anti-aromatic. I > think the standard definition of anti-aromaticity (agrees with the one linked > to from

[Rdkit-discuss] acepentalene aromaticity perception

2020-01-09 Thread Andrew Dalke
Hi all, Could someone explain the following, which uses the SMILES from https://en.wikipedia.org/wiki/Acepentalene : >>> from rdkit import Chem >>> Chem.CanonSmiles("C1=CC2=CC=C3C2=C1C=C3") 'c1cc2ccc3ccc1-c=3-2' >>> import rdkit >>> rdkit.__version__ '2019.09.1' I don't understand the aromatic

Re: [Rdkit-discuss] Clearing isotope info

2019-12-12 Thread Andrew Dalke
On Dec 12, 2019, at 17:39, Rafal Roszak wrote: > I also had situation when I need to generate smiles with either > isotopes or stereochemistry but not both. Maybe it is worth to add two > options to ChemMolToSmiles function: > > dontIncludeStereochemistry=True/False >

[Rdkit-discuss] assign all bond directions in SMILES

2019-11-19 Thread Andrew Dalke
Hi all, Is there any way to assign all bond directions (E/Z stereochemistry) to the output SMILES string? For example, here's a structure: >>> mol = Chem.MolFromSmiles(r"F/C(Cl)=C(O)/N") >>> Chem.MolToSmiles(mol) 'N/C(O)=C(/F)Cl' It's a minimal definition, in that I could have specified the

Re: [Rdkit-discuss] MolToSmiles preserve atom order

2019-11-18 Thread Andrew Dalke
On Nov 18, 2019, at 17:40, David Cosgrove wrote: > > Point taken. I don’t think you’d be able to get RDKit to spit such SMILES > strings out unless you tortured it pretty hard, however. Did someone mention one of my favorite things to do? :) See:

[Rdkit-discuss] Second call for mmpdb funding

2019-11-15 Thread Andrew Dalke
Hi all, The end of the year is coming up. Perhaps there's extra money in your budget which can go to support open source development in cheminformatics? As many of you know, I started a crowdfunding effort to fund improvements to the matched molecular pair program "mmpdb". I want to see if

Re: [Rdkit-discuss] missing MolFromSmiles error output in Jupyter

2019-10-16 Thread Andrew Dalke
Dear Stéphane, > On Oct 16, 2019, at 19:39, Téletchéa Stéphane > wrote: > Did you 'by chance' transmit your presentation in PDF? Yes, I exported my Keynote.app presentation to PDF. However, I also sent the specific commands in email as plain text, as part of the process of trying to

[Rdkit-discuss] missing MolFromSmiles error output in Jupyter

2019-10-16 Thread Andrew Dalke
Hi all, I wasn't able to give my RDKit training session at the last UGM, so I passed out the presentation materials to the students who signed up. One of them wrote to me asking why the following didn't display an error message in the notebook. from rdkit import Chem from rdkit.Chem.Draw

Re: [Rdkit-discuss] SubstructMatch of identical Mols returns different results

2019-10-03 Thread Andrew Dalke
On Oct 3, 2019, at 20:34, Ondrej Gutten via Rdkit-discuss wrote: > # MCS is a benzene > my_mcs = Chem.MolFromSmiles(res.smarts) The res.smarts (or res.smartsString if you use the rdFMCS module) returns a SMARTS string, not a SMILES string. You should be using Chem.MolFromSmarts() in the

[Rdkit-discuss] mmpdb crowdfunding project has started

2019-09-23 Thread Andrew Dalke
Hi all, In August I sent a pre-announcement email about my mmpdb crowdfunding project. The project is now live, at http://mmpdb.dalkescientific.com/ . The basic idea is that I can commit to developing a few features for mmpdb. • Postgres support, as an alternative to the existing SQLite

Re: [Rdkit-discuss] Tanimoto and fingerprint representation

2019-09-14 Thread Andrew Dalke
Hi Jan, The GetMorganFingerprint() returns count fingerprints, and the Tanimoto calculation does the full Jaccard similarity, including the counts. The GetMorganFingerprintAsBitVect() version only uses the keys (that is, it treats all non-zero values as being 1) when computing the Tanimoto.

Re: [Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-21 Thread Andrew Dalke
Hi Jameed, I don't think your approach will work, which means I likely didn't explain myself well enough. Let's say I start with: Cc1cc2c2c(=O)o1 - https://cactus.nci.nih.gov/chemical/structure/Cc1cc2c2c(=O)o1/image I want to break the aromatic bond between the aromatic 'c'

Re: [Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-21 Thread Andrew Dalke
On Aug 21, 2019, at 03:42, Francois Berenger wrote: > Unless rdkit has something, I think graph edit distance is the kind > of things for which you have to rely on a good graph library. Do you know of any (non-chemical) graph library which can handle edits involving the breaking of aromatic

[Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-20 Thread Andrew Dalke
Hi all, Someone asked me recently about finding the graph edit distance of two small (<= 14 atom) fragments. I figured this was something that could be brute forced. Following SmallWorld's example at https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment, incrementally delete

Re: [Rdkit-discuss] GetSubstructMatches() as smiles

2019-08-07 Thread Andrew Dalke
On Aug 7, 2019, at 13:08, Paolo Tosco wrote: > You can use > > Chem.MolFragmentToSmiles(mol, match) > > where match is a tuple of atom indices returned by GetSubstructMatch(). Note however that if only the atom indices are given then Chem.MolFragmentToSmiles() will include all bonds which

[Rdkit-discuss] planning a crowdsourcing project for mmpdb development

2019-07-31 Thread Andrew Dalke
or consulting work. 3) Where should I send questions and suggestions? Right now, private email to me is the best. I'll set up a mailing list and project web page if I get preliminary feedback that it's worth my time to go further with this trial. Thanks for reading to the end!

Re: [Rdkit-discuss] Open-source business models and the RDKit (Greg Landrum)

2019-04-01 Thread Andrew Dalke
On Mar 27, 2019, at 13:26, Chris Swain via Rdkit-discuss wrote: > This is an interesting discussion and suspect this does not only apply to > open-source software developers, there are similar challenges for small > independent software companies. My points were focused on the disadvantages

Re: [Rdkit-discuss] Open-source business models and the RDKit

2019-04-01 Thread Andrew Dalke
On Mar 27, 2019, at 16:44, Bennion, Brian via Rdkit-discuss wrote: > One of the goals of ATOM is to fund work that will be open sourced. I think > any of the partners can choose to hire consultants for the work. > > https://atomscience.org/ > Atom > atomscience.org I think there are only

Re: [Rdkit-discuss] Open-source business models and the RDKit

2019-03-27 Thread Andrew Dalke
On Mar 27, 2019, at 08:24, Francois Berenger wrote: > As an open-source project, I feel rdkit is quite successful. > So, the user community is not so small. > Some people who cannot contribute time could contribute money to the project > (especially if it is tax-deductible, I guess). I think the

Re: [Rdkit-discuss] chemfp preprint

2019-03-26 Thread Andrew Dalke
On Mar 25, 2019, at 04:05, Francois Berenger wrote: > Sometimes, I wish there was a rdkit consortium/NPO (so that donations are tax > deductible), so that rdkit could be massively funded by all its commercial > users, and even accepting individual donations. Setting up such an organization is

[Rdkit-discuss] chemfp preprint

2019-03-22 Thread Andrew Dalke
Hi RDKit users, This week I submitted a paper about chemfp for publication. I also submitted a preprint on ChemRxiv, which was just accepted. For those interested, it's at https://chemrxiv.org/articles/The_Chemfp_Project/7877846 . It's a rather long paper as it covers many aspects about the

Re: [Rdkit-discuss] contrib code not compiling

2018-11-20 Thread Andrew Dalke
On Nov 19, 2018, at 04:17, Rajarshi Guha wrote: > Hi, I check out the latest RDKit sources from master and I'm trying to > compile the PBF. However, the compilation fails reporting that > RDGeneral/export.h is missing: While this doesn't answer the question, it seems to be coupled to

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Andrew Dalke
On Sep 26, 2018, at 20:26, Peter S. Shenkin wrote: > Ah, David, but how do you define a "real" singleton? There can be many different definitions of what a '"real" singleton' might be, but we are specifically talking about Butina clustering. The Butina paper defines the term "false singleton",

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Andrew Dalke
On Sep 25, 2018, at 17:13, Peter S. Shenkin wrote: > FWIW, in work on conformational clustering, I used the “most representative” > molecule; that is, the real molecule closest to the mathematical centroid. > This would probably be the best way of displaying a single molecule that > typifies

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Andrew Dalke
On Sep 21, 2018, at 14:53, Philipp Thiel wrote: > you probably read about the Tanimoto being a proper metric in case of having > binary data > in Leach and Gillet 'Introduction to Chemoinformatics' chapter 5.3.1 in the > revised edition. What we call Tanimoto is more broadly known as the

Re: [Rdkit-discuss] Equivalent atom neighbours

2018-09-07 Thread Andrew Dalke
On Sep 7, 2018, at 22:22, Alexey Orlov wrote: > I'm trying to calculate the number of equivalent/nonequivalent neighbor > heteroatoms for each atom i of molecule m. > > For examples, the third carbon atom of molecule CC(OH)CC has two > nonequivalent neighbors: one carbon atom connected to OH

[Rdkit-discuss] available Python/RDKit training slots for the 2018 UGM

2018-09-04 Thread Andrew Dalke
Hi all, As you may know, I will be offering a free Python/RDKit training session before the UGM in a couple of weeks. This is a beginner level course for people with some programming experience. It will cover the basics of Python, RDKit, JupyterLab, Pandas, Scikit-Learn and more. The goal is

Re: [Rdkit-discuss] no structure depiction in Jupyter notebook

2018-08-31 Thread Andrew Dalke
On Aug 31, 2018, at 15:27, Axel Pahl wrote: > on Linux, using Anaconda, RDKit and Python 3.6, I always need to additionally > install cairocffi via pip: Thanks Axel. When I tried it out, I figured out the more likely problem - I was using jupyter from a non-conda virtualenv. My problem

Re: [Rdkit-discuss] no structure depiction in Jupyter notebook

2018-08-31 Thread Andrew Dalke
On Aug 31, 2018, at 11:58, Andrew Dalke wrote: > I am unable to see an inline structure depiction in the Jupyter notebook, nor > in the JupyterLab notebook, tested with both the Python 2 and Python 3 > kernels, and rdkit.__version__ '2018.03.1'. I've narrowed it down to the Cairo code

[Rdkit-discuss] no structure depiction in Jupyter notebook

2018-08-31 Thread Andrew Dalke
Hi all, I am unable to see an inline structure depiction in the Jupyter notebook, nor in the JupyterLab notebook, tested with both the Python 2 and Python 3 kernels, and rdkit.__version__ '2018.03.1'. I installed miniconda and RDKit on my Mac using: curl -O

Re: [Rdkit-discuss] descriptors beyond rotatable bond count and possible correlations with entropy

2018-08-31 Thread Andrew Dalke
On Aug 31, 2018, at 07:41, Paolo Tosco wrote: > this gist should do what you need: Unless I misinterpreted what Jim is looking for, I don't think that returns the contiguous rotatable bonds in a small molecule. In the following there are only two rotatable bonds: >>> mol =

Re: [Rdkit-discuss] want advice for good teaching data set

2018-08-30 Thread Andrew Dalke
Thanks for the responses. I'll merge them into one reply: On Aug 29, 2018, at 16:56, Eloy Félix wrote: > If you want to build model I guess that what you want is to get experimental > logp values. > > This should give you something to start with: > > select ACTIVITY_ID, MOLREGNO,

[Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread Andrew Dalke
Hi all, I am starting to put together materials for the Python/RDKit training course I'm giving just before the RDKit UGM next month. I would like to structure part of it around the SQLite release of the ChEMBL data set. More specifically, I plan to include examples of machine learning with

Re: [Rdkit-discuss] cartridge license?

2018-08-23 Thread Andrew Dalke
On Aug 23, 2018, at 07:18, Roman Bolzern wrote: > Dear RDKittens, I would prefer to not be called a 'kitten'. > https://www.rdkit.org/docs/Cartridge.html#license, and at the bottom it says > “This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 > License”, ... > Is

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Andrew Dalke
On Jun 29, 2018, at 02:43, 藤秀義 wrote: > Although not strictly based on the number of atoms, but on the length of > SMILES string, the simplest way is using Python built-in functions as follows: > > smiles = 'CCC.CC' > fragment = max(smiles.split('.'), key=len) > print (fragment) The mmpdb

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Andrew Dalke
On Jun 28, 2018, at 22:08, Paolo Tosco wrote: > if you wish to keep only the largest disconnected fragment you may try the > following: > > mols = list(rdmolops.GetMolFrags(mol, asMols = True)) > if (mols): > mols.sort(reverse = True, key = lambda m: m.GetNumAtoms()) > mol = mols[0] A

Re: [Rdkit-discuss] (Morgan) fingerprints for specific atom?

2018-06-18 Thread Andrew Dalke
Hi Андрей, The GetMorganFingerprint function takes additional parameters. From http://rdkit.org/Python_Docs/rdkit.Chem.rdMolDescriptors-module.html#GetMorganFingerprint GetMorganFingerprint( (Mol)mol, (int)radius [, (AtomPairsParameters)invariants=[] [, (AtomPairsParameters)fromAtoms=[]

Re: [Rdkit-discuss] converting from ToBitString() to SMILES

2018-06-17 Thread Andrew Dalke
> On Jun 17, 2018, at 21:04, Raghuram Srinivas > wrote: > Is there a way to convert a bit string of 2048 bits back to the SMILES / > BitVector representation of the molecule? Any help /pointers in this > direction will be much appreciated . That topic came up on this list in April of this

Re: [Rdkit-discuss] boron atom/element support in RDkit

2018-06-12 Thread Andrew Dalke
On Jun 12, 2018, at 18:00, Bennion, Brian via Rdkit-discuss wrote: > Does RDkit support boron in SMILES strings? We have a number of compounds > for which rdkit parsing is not successful. The commonality is that there is > a B or b listed in the string. RDKit supports boron, including

Re: [Rdkit-discuss] Calculating MorganFingerprint Counts for large number of Molecules

2018-05-18 Thread Andrew Dalke
On May 18, 2018, at 17:48, Jennifer Hemmerich wrote: > I really liked the idea and I implemented it as follows: > df = pd.DataFrame(columns=counts.keys()) > for i,fp in enumerate(allfps): > logger.debug('appending %s', str(i)) >

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-05-09 Thread Andrew Dalke
And I have uploaded a source tar.gz and a binary wheel to PyPI. That means you can do "pip install mmpdb" to install this most recent version. Andrew da...@dalkescientific.com > On May 9, 2018, at 18:04, Kramer, Christian

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-05-08 Thread Andrew Dalke
Dear Marco, > On May 7, 2018, at 23:59, Marco Stenta wrote: > I had some time to set an environment for it and test it: it works fine, as > far as my tests go. I will switch to this version and to the latest RDKIT now. Thanks for the feedback. Someone else sent me a

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-05-06 Thread Andrew Dalke
On Apr 27, 2018, at 00:20, Andrew Dalke <da...@dalkescientific.com> wrote: > Please try out: > http://dalkescientific.com/mmpdb-2.1b1.tar.gz > > or my fork at: > https://github.com/adalke/mmpdb > > and let me know of any problems. Has anyone downloaded and tes

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-04-27 Thread Andrew Dalke
On Apr 27, 2018, at 00:20, Andrew Dalke <da...@dalkescientific.com> wrote: > It does not appear that the .fragment files also need to be redone, so > rebuilding the .mmpdb file is mostly a matter of re-running the index step. I no longer think that is correct. While indexi

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-04-26 Thread Andrew Dalke
On Apr 26, 2018, at 12:38, Andrew Dalke <da...@dalkescientific.com> wrote: > The automated mmpdb test suite isn't that good, so I still need to do some > manual testing. I won't be able to get to this until (hopefully) this evening. I did that, and tracked down one more bug. Pl

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-04-26 Thread Andrew Dalke
On Apr 26, 2018, at 10:09, Marco Stenta wrote: > > Dear Colleagues, > I just installed on conda env the new rdkit version > and wanted to try mmpdb but upon testing I got the error below > reverting back to rdkit=2017.09.3.0 it works fine (I still get some errors > but

Re: [Rdkit-discuss] RDKIT 2018.3 and MMPDB problem

2018-04-26 Thread Andrew Dalke
On Apr 26, 2018, at 10:09, Marco Stenta wrote: > > Dear Colleagues, > I just installed on conda env the new rdkit version > and wanted to try mmpdb but upon testing I got the error below > reverting back to rdkit=2017.09.3.0 it works fine (I still get some errors > but

Re: [Rdkit-discuss] 2018.03.1 RDKit release

2018-04-25 Thread Andrew Dalke
On Apr 25, 2018, at 01:31, David Hall wrote: > You need to turn off RDK_INSTALL_INTREE Thanks! I've put that my build notes for the next time I compile RDKit. BTW, a quick benchmark of the new release shows that it's almost 15% faster at parsing SMILES strings than

Re: [Rdkit-discuss] 2018.03.1 RDKit release

2018-04-24 Thread Andrew Dalke
> On Apr 23, 2018, at 10:43, Greg Landrum wrote: > > I'm pleased to announce that the next version of the RDKit - 2018.03 - is > released. The release notes are below. ... > Please let me know if you find any problems with the release or have > suggestions for the

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-23 Thread Andrew Dalke
On Apr 23, 2018, at 14:54, Brian Cole wrote: > Unfortunately it doesn't work on circular/ECFP-like fingerprints. To be fair, you didn't mention that was a requirement. ;) > It has the requirement that the fingerprint be a substructure fingerprint as > you described. Could

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-22 Thread Andrew Dalke
On Apr 22, 2018, at 20:22, Nils Weskamp wrote: > Actually, I *was* also thinking about your use cases 2 and 3 since you > also need some form of hash function to map substructures to bit > numbers. This is normally a rather simple function / pseudo random > generator,

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-22 Thread Andrew Dalke
On Apr 22, 2018, at 08:42, Nils Weskamp wrote: > Nice work. If brute-force approaches like this (or methods based on > genetic algorithms etc.) are the only way to reverse a fingerprint, one > could probably come up with a fingerprint that allows for pretty secure >

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-21 Thread Andrew Dalke
On Apr 21, 2018, at 01:55, Andrew Dalke <da...@dalkescientific.com> wrote: > Hand-waving sketch: start with a carbon. Generate fingerprint. It should pass > the screening test. If not, the structure contains no carbons, so repeat with > other elements until you find an at

Re: [Rdkit-discuss] Any known papers on reverse engineering fingerprints into structures?

2018-04-20 Thread Andrew Dalke
On Apr 20, 2018, at 19:03, jeff godden wrote: > > Long ago molecular fingerprints were referred to in the literature as > molecular hash functions. (y'know, those crazy mathematical algorithms which > permitted rapid lookup of some string in a lookup table) Do you have a

Re: [Rdkit-discuss] issue during parsing a smile

2018-04-16 Thread Andrew Dalke
On Apr 16, 2018, at 16:29, Guillaume GODIN wrote: > And for this one C[C@@]12CC[C@@](C)(CC1)O2O any idea > > Cause your tool failed too. It's true that smiview failed, in the sense that it shouldn't have tried to do further analysis with a molecule that RDKit

Re: [Rdkit-discuss] issue during parsing a smile

2018-04-16 Thread Andrew Dalke
If you try this out with my smiview package, available from https://bitbucket.org/dalke/smiview/downloads/ , it reports: % smiview 'C\(C(C)C)=N/O' Cannot parse --smiles: Unexpected term C\(C(C)C)=N/O ^ Tokenizing stopped here A bond must be followed by an atom, closure. That is, the bond

Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-16 Thread Andrew Dalke
On Apr 16, 2018, at 05:37, Patrick Walters wrote: > > Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so I > wrote something to work directly on a molecule. That's the approach I started with, until I figured out that it doesn't preserve

Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-15 Thread Andrew Dalke
Hi Pat, I wrote something like this for mmpdb, which is the MMPA code I helped develop, at https://github.com/rdkit/mmpdb . It has one restriction, which I'll get to in a moment. The general idea is to convert the attachment points to closures, join them with a ".", and canonicalize: >>>

Re: [Rdkit-discuss] [Rdkit-announce] [Announcement] 7th RDKit UGM in Cambridge UK

2018-04-11 Thread Andrew Dalke
On Apr 7, 2018, at 07:13, Greg Landrum <greg.land...@gmail.com> wrote: > Andrew Dalke (Dalke Scientific) will offer a course on Python and the RDKit I need to finalize what I'm going to cover. I've been going between two approaches. 1) Python programming for cheminformatics This

[Rdkit-discuss] smiview 1.2

2018-04-03 Thread Andrew Dalke
About 10 days ago I posted a prototype program called 'smiview', which displays information about the structure of a SMILES string. Thanks to feedback from a couple of users, and a deep urge to explore the idea, I've just released smiview 1.2, available from

[Rdkit-discuss] smiview 1.1 - a console tool to view SMILES strings

2018-03-24 Thread Andrew Dalke
Over the last few days I've developed a command-line tool that I call "smiview". It's a SMILES viewer. It isn't a depiction tool where the input is in SMILES but rather a tool to highlight different aspects of the SMILES string. I'll put some examples at the end. If you want to try it out you

  1   2   3   >