Hi all,

Now that Greg's on vacation ... ;)

For the last 6-8 weeks I've been working on a new MCS algorithm to
find the MCS of a set of structures. This work has been funded by
Roche, with the explicit hope that it be incorporated into RDKit.

The software is now ready for people to try out. It's available
for direct download from

https://bitbucket.org/dalke/fmcs/downloads/fmcs-1.0b1.tar.gz

or you can see the README from its main BitBucket page

https://bitbucket.org/dalke/fmcs/

This code installs the single Python file 'fmcs.py' and a thin
command-line wrapper called 'fmcs'. The software is distributed
under the 2-clause BSD license.

Here's an example of use; see the README for more examples:

 % fmcs sample_files/benzotriazole.sdf
 [#7]:1:[#7]:[#7]:[#6]:2:[#6]:[#6]:[#6]:[#6]:[#6]:1:2 9 atoms 10 bonds 
(complete search)

The output is a single line to stdout containing a SMARTS
description of the matching subgraph, the number of atoms, the
number of bonds, and information about if the complete search
was done or if there was a timeout and only the current best
solution was reported.

More technically, this program finds the connected Maximum Common
Edge Subgraph, with the option of maximizing the number of atoms or
the number of bonds. The default matches atoms based on elements
and bonds based on bond types, with several other options
available. There are also command-line options to require that
ring bonds only match ring bonds and chain bonds only match chain,
or the even stronger requirement that a ring bond in the structure
must also be a ring bond in the MCS.

Feedback and Funding
====================

The goal now is to get feedback on what's needed in order to
pass the MCS code over to RDKit. However, there isn't that
much funding left, so I hope the answer is "very little."

Please download the package, try it out on your data sets, and
let me know if it works for you. For example, I'm pretty sure
it won't work on Python 2.6 unless you have the argparse package
installed, and I probably use a few things which aren't supported
on Python 2.5.

And if you're interested in furthering the development of fmcs,
then feel free to fund me. The TODO file contains a list of
suggestions for future work.


Thanks
======

Greg Landrum and Jérôme Hert have provide test data set, and
Greg gave guidelines on the functionality he wants in an MCS
algorithm for RDKit.

I thank also Asad Rahman for his work on SMSD (Small Molecule
Subgraph Detector), which I used during validation, and
Alexander Savelyev and the Indigo developers, whose toolkit
and scaffold detection code I also used. Validation is hard.
There were several subtle bugs which I found by generating a
lot of comparisons between fmcs and other tools then tracing
down the differences.

The detective work was made easier with Karen Schomburg's
SMARTSviewer and Daylight's online depictmatch.cgi tool.


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to