Hi Neal,

Let me provide a more thorough answer than I did previously, I'll move
it back on list too so that the answers are out there and archived.

On 9/13/07, Noel O'Boyle <baoille...@gmail.com> wrote:
> (off-list due to inquisitive nature of some questions, although feel
> free to move back onto-list)
>
> First of all, Greg, this is pretty expletive impressive. In fact, it's
> unbelievable. Not only have you matched the OpenBabel or Daylight
> toolkits, in many ways (perhaps all?) you have surpassed them. I can't

Thanks for the complements, I appreciate them and I'm sure that
Santosh (the other developer) does as well, but I do want to temper
the enthusiasm with a bit of reality: the RDKit stuff isn't nearly as
highly optimized or thoroughly battle tested as Daylight.

> believe this code has been out for more than a year. You've got 2D and
> 3D coordinate generation, and everything! Something the open source
> world has been crying out for for the last few years. If this code had
> been around when I first heard about the Python interface to OB
> (almost 2y ago now), I probably wouldn't have been involved. (To
> clarify, I'm involved at the Python end of OpenBabel)
>
> So on to the questions:
> (1) Can I believe my eyes? Is this really open source? A lot of the
> Python code has a very restrictive copyright statement right at the
> start (see Windows release, AllChem.py for example)

The "All rights reserved" is, in my opinion, superseded by the
license.txt file (which is BSD, except for the GUI components, which
are GPL due to Qt license restrictions). It's really open source and
it's as open as we could make it without going public domain (I
consider the BSD license to be far more open than the GPL, which is
quite restrictive IMO).

> (2) How does, if at all, being the product of a company affect this
> toolkit? I guess what I'm saying is, are you willing to engage with
> the OS community. Is the code likely to be taken back in house (which
> happened with OEChem, for example)?

The company is no more, so no worries about that.

> (3) What's the story with version numbers and backwards-incompatible
> API changes? That is, do you try to maintain the API across releases?

Yes. API changes would break the unit tests, which would require a lot
of work to fix, so pure developer laziness dictates API stability. We
also put a lot of time into making sure that the various binary
formats used can always be parsed backwards (e.g. if a file format
change happens newer versions can still read old files).

> (4) Why haven't you publicised RDKit, if you don't mind me asking? For
> example, there is an excellent (if I do say so myself) website called
> Linux4Chemistry which lists the excellent (if you do say so yourself)
> YaEHMOP. Also there's the CCL mailing list. I only found RDKit because
> of trawling through the SF software map. Is this, um, shyness,
> intentional?

There are many components to the answer to this question. Some are:
 1) Promotion isn't something I enjoy or am particularly good at.
 2) I'm kind of afraid of having more users. I do a lot of this as a
free-time project and I'm afraid of spending all my time answering
questions. This is, of course, a bit stupid because if the whole open
source thing works then other people will pitch in and help with those
questions. For that to happen I need those other people as users,
which requires that I find them, which... it's a Catch 22

> (5) You may/not be aware but Numeric is deprecrated to the extent that
> it is not available for Python 2.5 on Windows. I had to replace a
> couple of "import Numeric"s with "from numpy import oldnumeric as
> Numeric", but this is only a temporary solution.

The Numeric thing is a definite problem (though it works fine for me
with Python2.5 under windows). I made an attempt a while ago to port
the code to use numpy, but was immediately frustrated by the lack of
documentation available (unless you buy the book) and the very
aggressive response of the community when I complained about this.

> (6) It'd be nice to have an installer for the Python stuff...I've done
> this for OpenBabel. It's pretty easy.

If you care to share how you did this, I'd be happy to learn. It's a nice idea.

> (7) Conceptually is this a C++ toolkit, or a Python toolkit with a C++
> backend? It seems that a lot of the work is done in Python...

It's both. The core data structures and algorithms are almost entirely
in C++, a lot of the "end-user" functionality is written in Python.
The model has been that new algorithms get coded first in Python and
then ported into C++ if it's needed for speed. The two APIs are
similar enough (to me at least) that this usually ends up being fairly
straightforward.

> (8) Are there any particular reasons you didn't base your code on OpenBabel?

Again, a complicated question. The short answer is:
1) at the time we started the RDKit development OpenBabel was still
OELib (or close to it) and didn't do what we wanted.
2) we were doing this at a software/services company, so the use of
the GPL erased any hope that we could make serious use of the code.

> (9) Have you seen Rajarshi's smi23d? (If not, have a google) It
> appears to use a similar method to yours to create a 3D conformation,
> although it has run into some patent problems due to stochastic
> proximity embedding. Do you use the same algorithm?

I haven't seen smi23d. We do *not* use SPE or anything related to it.
Our approach for embedding the coordinates from the distance matrix is
to just use the standard diagonalization procedure that is used by
things like DGeom. SPE (and related algorithms) is really overkill for
small molecules.

> (10) 2D depiction. Mega cool!! :-) Can I use it, can I, can I? But how
> does it work? Any chance of you writing a paper so I don't need to
> read the code?

Nope, no chance at all: there's nothing really innovative in it.
Santosh can pipe up with more information if he has time, but the
algorithm is pretty much what I'd call the "ChemDraw" algorithm (as
described in the Rev. Comp. Chem. article on depiction) with various
small mods that we put in to address particular edge cases. The
depictor isn't as good as ChemDraw or MOE or OpenEye, but it's not too
bad.

> (11) 2D representation of 3D structure. Unbelievable. You might want
> to check out the recent (ASAP alert in JCIM) Alex Clark paper on MOE's
> depiction.

Yeah, that was inspired by Alex's work and the great pictures that MOE
makes. That's great stuff.

> (12) Interested in easily converting a ROMol to an OBMol and vice
> versa? I am. It'd be trivial to do this at the Python level. We could
> coordinate a bit to make the methods somewhat symmetrical. It would
> make it easy to unittest shared algorithms against each other, e.g.
> LogP calculation, SMILES, or whatever.

It would be an interesting exercise. I'm not convinced that it would
be trivial to get it right.  There's a lot of devil in the details of
things like aromaticity handling and general sanitization problems
(the RDKit is *very* picky about molecules being "clean").

> (13) Atoms don't have coordinates, is this right? You need to get
> their Idx, and look them up in a Conformer?

Correct. Neither atoms nor molecules know about coordinates, that
information is carried in the Conformer. This allows a molecule to
carry around multiple 2D and/or 3D Conformers.

-greg

Reply via email to