On Wed, May 23, 2012 at 2:24 PM, JP <jeanpaul.ebe...@inhibox.com> wrote:
> This molecule with no atoms being valid is a questionable design decision
> (not your fault of course, you are just implementing a spec).
>
> I think that the smiles writer should not write an empty molecule (you could
> change the method signature to take yet another param "empties=True/False"
> but I do not think this is correct either).  And IMHO the parser should not
> read one either.

The SMILES parser doesn't

> What happens with the following very organic smiles file?
>
> CCCCC JP1
> CCC JP2
>
> CCCC JP4
>
> The generator is going to give me an empty (third) molecule?  So I have to
> always dirty my code with m.GetNumAtoms() > 0 in that loop.  What is the
> empty molecule is at the end of the file (ouch)?
> Also what if you have an identifier for the empty molecule.  So replace the
> empty third line with " JP3" ?

The SMILES parser currently does not accept empty strings. It
generates an error message and returns "None". I don't have any plans
to change this.

>
> What is the ForwardSDMolSupplier/SDWriter going to do with this empty mol?
>  Does it just write name and properties?

The name, the empty mol block, and the properties:
In [2]: m = Chem.Mol()
In [4]: import StringIO
In [5]: sio = StringIO.StringIO()
In [6]: m.SetProp('prop1','foo')
In [7]: m.SetProp('prop2','bar')
In [8]: w = Chem.SDWriter(sio)
In [9]: w.write(m)
In [10]: w.close()
In [11]: print sio.getvalue()

     RDKit

  0  0  0  0  0  0  0  0  0  0999 V2000
M  END
>  <prop1>  (1)
foo

>  <prop2>  (1)
bar

$$$$


The SD readers can also work with this:
In [22]: rdr = Chem.SDMolSupplier()
In [23]: rdr.SetData(sio.getvalue())
In [24]: m2 = rdr.next()
In [25]: m2.GetNumAtoms()
Out[25]: 0

In [26]: m2.GetProp('prop1')
Out[26]: 'foo'


> I don't want to be controversial or anything, but I disagree with almost
> everyone else about this, in that we should use common sense and not stick
> to the spec in this case.
> Does someone have a use-case for an empty molecule ?  At least I can
> understand what people are using this for

Having an empty molecule can be useful in cases where you want to
include information about the molecule (i..e using the property fields
in an SDF) but either do not know what the actual structure is, do not
want to disclose what the structure is, or cannot represent the
structure in a reasonable manner in an SDF (e.g. solids,
organometalllics, etc.).

> Also having the writer do one thing, and the parser do another means that
> RDKit cannot read the files/molecules it generates.  I think this is a big
> inconsistency, and not one deserving of this excellent bit of software.

I also don't like the inconsistency, but I don't see how it's really
avoidable: empty molecules need to be supported, so there are limited
choices available.

-greg

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to