Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Peter S. Shenkin
Well, I'm not really familiar with the Taylor-Butina clustering method, so I'm proposing a methodology based on generalizing something that I found to be useful in a somewhat different clustering context. Presuming that what you are clustering is the fingerprints of structures, and that you know

[Rdkit-discuss] Docker image for GSOC2018_MolVS_Integration

2018-09-25 Thread Tim Dudgeon
I was very happy to hear about the integration of MolVS into RDKit core in the talk by Susan Leung at the recent UGM. https://github.com/rdkit/UGM_2018/blob/master/Presentations/Leung_GSoC_RDKit-MolVS_Integration.pdf This is going to be incredibly useful once it gets released. To help with

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Andrew Dalke
On Sep 21, 2018, at 14:53, Philipp Thiel wrote: > you probably read about the Tanimoto being a proper metric in case of having > binary data > in Leach and Gillet 'Introduction to Chemoinformatics' chapter 5.3.1 in the > revised edition. What we call Tanimoto is more broadly known as the

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Peter S. Shenkin
(I see that I accidentally responded to Andrew, only, earlier; I'm copying to the group this time.) FWIW, in work on conformational clustering, I used the “most representative” molecule; that is, the real molecule closest to the mathematical centroid. This would probably be the best way of

Re: [Rdkit-discuss] Saving mol file

2018-09-25 Thread Greg Landrum
Hi Colin, The RDkit outputs charge information to mol blocks using the CHG line: In [3]: m = Chem.MolFromSmiles('C[NH3+]') In [4]: print(Chem.MolToMolBlock(m)) RDKit 2D 2 1 0 0 0 0 0 0 0 0999 V2000 0.0.0. C 0 0 0 0 0 0 0 0 0 0 0 0

Re: [Rdkit-discuss] Saving mol file

2018-09-25 Thread Colin Bournez
Well yes I have this line indeed, I did not put the whole file for clarity purpose. The thing is tools as MOE, Pymol read it without problem but RDock for example can't read it properly and returns a neutral N which is not the case. And if I open it with pymol and save it back in mol format,

[Rdkit-discuss] boron compound not recognized by RDkit

2018-09-25 Thread Bennion, Brian via Rdkit-discuss
Hello, Awhile back I had noticed that rdkit has issues with boron containing compounds. One is below, and I admit it is a strange one. I read in an sdf file and write it out after calculating a formal charge on the molecule. It seems to be read into rdkit ok but writing errored out with

[Rdkit-discuss] Saving mol file

2018-09-25 Thread Colin Bournez
Hey everyone, I have a question concerning the Chem.MolToMolFile() function. When I open this file containing a N+ (here is the line corresponding in the mol file) : 11.37003.4360 -11.8300 N 0 3 0 0 0 0 0 0 0 0 0 0 And I just save it back withotu any modification, the

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-25 Thread Andrew Dalke
On Sep 25, 2018, at 17:13, Peter S. Shenkin wrote: > FWIW, in work on conformational clustering, I used the “most representative” > molecule; that is, the real molecule closest to the mathematical centroid. > This would probably be the best way of displaying a single molecule that > typifies