Hi all, Now that Greg's on vacation ... ;)
For the last 6-8 weeks I've been working on a new MCS algorithm to find the MCS of a set of structures. This work has been funded by Roche, with the explicit hope that it be incorporated into RDKit. The software is now ready for people to try out. It's available for direct download from https://bitbucket.org/dalke/fmcs/downloads/fmcs-1.0b1.tar.gz or you can see the README from its main BitBucket page https://bitbucket.org/dalke/fmcs/ This code installs the single Python file 'fmcs.py' and a thin command-line wrapper called 'fmcs'. The software is distributed under the 2-clause BSD license. Here's an example of use; see the README for more examples: % fmcs sample_files/benzotriazole.sdf [#7]:1:[#7]:[#7]:[#6]:2:[#6]:[#6]:[#6]:[#6]:[#6]:1:2 9 atoms 10 bonds (complete search) The output is a single line to stdout containing a SMARTS description of the matching subgraph, the number of atoms, the number of bonds, and information about if the complete search was done or if there was a timeout and only the current best solution was reported. More technically, this program finds the connected Maximum Common Edge Subgraph, with the option of maximizing the number of atoms or the number of bonds. The default matches atoms based on elements and bonds based on bond types, with several other options available. There are also command-line options to require that ring bonds only match ring bonds and chain bonds only match chain, or the even stronger requirement that a ring bond in the structure must also be a ring bond in the MCS. Feedback and Funding ==================== The goal now is to get feedback on what's needed in order to pass the MCS code over to RDKit. However, there isn't that much funding left, so I hope the answer is "very little." Please download the package, try it out on your data sets, and let me know if it works for you. For example, I'm pretty sure it won't work on Python 2.6 unless you have the argparse package installed, and I probably use a few things which aren't supported on Python 2.5. And if you're interested in furthering the development of fmcs, then feel free to fund me. The TODO file contains a list of suggestions for future work. Thanks ====== Greg Landrum and Jérôme Hert have provide test data set, and Greg gave guidelines on the functionality he wants in an MCS algorithm for RDKit. I thank also Asad Rahman for his work on SMSD (Small Molecule Subgraph Detector), which I used during validation, and Alexander Savelyev and the Indigo developers, whose toolkit and scaffold detection code I also used. Validation is hard. There were several subtle bugs which I found by generating a lot of comparisons between fmcs and other tools then tracing down the differences. The detective work was made easier with Karen Schomburg's SMARTSviewer and Daylight's online depictmatch.cgi tool. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss