On 11/28/2016 10:25 AM, Stephen O'hagan wrote: > Has anyone come up with fool-proof way of matching structurally equivalent > molecules?
This is somewhat convoluted and there is no proof that it's fool-proof. A few years ago we had good results from running graphpowerhash() function here: http://madgik.github.io/madis/aggregate.html#module-functions.aggregate.graph on the PDB ligand database. The parameters were - atom1, atom2 IDs (names) as node1, node2. - Atom stereo (R, S, N), aromatic (y/n), and "leaving atom" (y/n) for the atoms as node1_details, node2_details (packed into single string with jpack() function: see http://madgik.github.io/madis/row.html). Looking at it now, I don't think nodeN_details parameter needs to include atom's "aromatic" flag. - Massaged bond type and bond stereo (E, Z, N) as edge_details. Also packed into a string as above. PDB chem comp model has bond type as SING or DOUB with a separate yes/no "aromatic" column. We changed it to AROM for the ones where that was a yes. The basic model is a list of bonds with atom1, atom2, and type, and a list of atoms with stereo, aromatic, and "leaving" flags -- the last one is "Y" for atoms that "go away" when forming a bond. The algorithm itself, as far as I know (I am not the author), takes the two "matrices" representing the molecule "graphs", computes their largest eigenvalue/eigenvectors, and compares those. We have no proof that it's 100% correct, but all duplicates it found in the PDB ligand expo at the time were genuine. Enjoy, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss