Thanks for the explanation and the helpful-as-ever code snippets Greg.

A molecule without atoms sounds a bit weird to me but it seems to be a
perfectly legal (in CTFile definition) to have a no-atom molecule.
"Each record can hold one molecule (which may be blank)." from this very
inspiring article "Why not to use SDF":
http://molmatinf.com/whynotmolsdf.html

So a case for it exists.

On the practical side, in terms of the API some things to think about:

- Would you define the checks to exclude, or the ones to include in
SanitzeMol?
- Shouldn't the default behaviour be to run all checks (or?) ?
- How would sanitizemol be called from other methods such a SDMolSupplier
etc. (the methods which allow for optional sanitization)?  Are you going to
allow for flags to be passed there too?  Isn't this going to make the API
unwieldy?
- Will it the flags be in the form of ints which you OR together
CHECK_EMPTY_MOL | CHECK_KEKULIZE (like Java/C++ kind of options) ?

PS the salt removal code is working fine as is.  I'd rather remove both
[Li].[Br] and check for the empty molecule as Nik suggested rather than
keep one of those "little" buggers.

Have a nice evening!


-
Jean-Paul Ebejer
Early Stage Researcher


On 30 January 2012 19:20, Greg Landrum <greg.land...@gmail.com> wrote:

> On Mon, Jan 30, 2012 at 1:26 PM, JP <jeanpaul.ebe...@inhibox.com> wrote:
> >
> > But then I will have to add the "if not clean_mol.GetNumAtoms():"
> > before/after replacing/editing molecule parts, after reading molecules,
> > before writing them etc. i.e. I'd need this statement in a lot of places.
> > This is why I asked if it should be considered a valid molecule -
> because if
> > these moves in SanitizeMol I wouldn't need any of that e.g. I can assume
> > that the molecule I have in hand, is valid and if I still wanted these
> > molecules (for some not so clear reason) I could just switch of
> sanitization
> > off, on the methods that allow it.
> >
> > Just asking, there is probably some good design decision for this which
> I am
> > missing... (hence the question)
>
> It's not an easy one. I believe there's not really a strong argument
> for either behavior. As you've seen, the current behavior of the RDKit
> is to treat molecules without atoms as completely legal entities. You
> can test for this case the way Nik pointed out.
>
> I'm playing with the idea of making the SanitizeMol routine
> configurable, so you could pass in a set of flags to control which
> operations are carried out. If this happens, a "check for zero atoms"
> flag that defaults to false could be added. I just created a feature
> request for this:
>
> https://sourceforge.net/tracker/?func=detail&aid=3481729&group_id=160139&atid=814653
>
> In the meantime, if you'd like to change the definition of
> sanitization, the easiest way to do so would be to write your own
> function, perhaps something like this (not tested):
>
> def mySanitize(mol):
>  if not mol.GetNumAtoms():
>    raise ValueError,'molecule has no atoms'
>  Chem.SanitizeMol(mol)
>
>
> Note: for the particular case of salt stripping, you can ensure that
> the salt stripper doesn't remove all atoms using the
> dontRemoveEverything optional argument. Take a look at the help for
> SaltRemover.StripMol:
>
> http://rdkit.org/docs/api/rdkit.Chem.SaltRemover.SaltRemover-class.html#StripMol
>
> -greg
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to