Dear all, I've been making some changes to the SMILES canonicalization code (more on this later) that have also led to some nice (IMO) performance improvements. Here are the numbers.
My usual benchmarking operations (http://code.google.com/p/rdkit/wiki/Benchmarking) don't really help here: 1000 molecules just aren't enough to see reliable differences. Here I'm using 25K molecules from the ZINC ZNP subset (http://zinc.docking.org/subsets/znp). This is a nice test set since the molecules are of reasonable size and contain plenty of stereochemistry (double bonds with stereochemistry and chiral centers). My tests were: build1: generate molecules from the sdf smiles1: generate canonical smiles smiles2: generate non-canonical smiles build2: generate molecules from the smiles build3: generate molecules from the smiles without stereochemistry cleanup build4: generate molecules from the smiles with very minimal sanitization (just UpdatePropertyCache() and FastFindRings()) Here's the timing information comparing the new code (still on a branch) with a couple of previous releases, run on my linux box. This looks like crap unless you're using a fixed-width font: | | build1 | smiles1 | smiles2 | build2 | build3 | build4 | | 2011_06_1 | 15.4 | 8.1 | 7.0 | 12.5 | | | | 2012_03_1 | 14.6 | 8.0 | 6.9 | 9.9 | 6.9 | 3.8 | | branch | 14.3 | 5.9 | 4.4 | 9.7 | 6.6 | 3.5 | I'm pretty happy with the progress that's being made here. Canonical SMILES generation is substantially faster than it used to be and the other operations are showing steady improvement. I'll be merging the branch back to the trunk in the next few days. -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-devel mailing list Rdkit-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-devel