Dear all,

I've been making some changes to the SMILES canonicalization code
(more on this later) that have also led to some nice (IMO) performance
improvements. Here are the numbers.

My usual benchmarking operations
(http://code.google.com/p/rdkit/wiki/Benchmarking) don't really help
here: 1000 molecules just aren't enough to see reliable differences.
Here I'm using 25K molecules from the ZINC ZNP subset
(http://zinc.docking.org/subsets/znp). This is a nice test set since
the molecules are of reasonable size and contain plenty of
stereochemistry (double bonds with stereochemistry and chiral
centers).

My tests were:
build1: generate molecules from the sdf
smiles1: generate canonical smiles
smiles2: generate non-canonical smiles
build2: generate molecules from the smiles
build3: generate molecules from the smiles without stereochemistry cleanup
build4: generate molecules from the smiles with very minimal
sanitization (just UpdatePropertyCache() and FastFindRings())

Here's the timing information comparing the new code (still on a
branch) with a couple of previous releases, run on my linux box. This
looks like crap unless you're using a fixed-width font:

|           | build1 | smiles1 | smiles2 | build2 | build3 | build4 |
| 2011_06_1 |   15.4 |     8.1 |     7.0 |   12.5 |        |        |
| 2012_03_1 |   14.6 |     8.0 |     6.9 |    9.9 |    6.9 |    3.8 |
| branch    |   14.3 |     5.9 |     4.4 |    9.7 |    6.6 |    3.5 |


I'm pretty happy with the progress that's being made here. Canonical
SMILES generation is substantially faster than it used to be and the
other operations are showing steady improvement.

I'll be merging the branch back to the trunk in the next few days.

-greg

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel

Reply via email to