Egon, I think you're a bit hasty with your analysis here.

The code example provided by Vincent clearly shows he's not iterating. He's
only reading a single benzene molecule (though it's contained in an SDF
file).

The problem seems to be that the MDLv2000Reader DOESN'T parse the molecule
properties beyond the "M  END" line if asked for an IMolecule, but DOES
parse them if asked for a IChemFile.
I'm guessing the molecule-reading stops after encountering "M  END", which
makes sense as this used to indicate the end of the molecule. Now with SDF
properties occurring after this terminator-token, things go awry.

The IChemFileReader of course has to continue beyond the M-END, because
there may be more molecules, in this process it "accidentally" encounters
the properties, which it then (correctly) associates with the previous
molecule.

I'm not sure which behaviour is most correct.. On one hand SDF-files are
primarily intended for multiple molecules, so using a
single-molecule-reading method can be expected to trip. On the other hand,
lots of people are using SDF as "MOL++", for single
molecules-with-added-properties.
In this case, I think it would be best to have the IMolecule-version reader
check if there are lines beyond "M  END", and if so, process until it
encounters "$$$$", or just stop at "M  END" if there's no more (non-empty)
lines.


BTW, Vincent, your code example is a bit inconsistent; you use "IMolecule"
in your second fragment (which is the correct, generic solution), but you
use "ChemFile" in the first fragment; that should be "IChemFile" to keep
things generic. The extra "I" indicates it's an Interface, a more generic
class.
For the reasoning behind this, please see "Cleaner CDK Code
#5"<http://chem-bla-ics.blogspot.com/2010/05/cleaner-cdk-code-5-developer-against.html>

Best regards,
Jules Kerssemakers,
PhD-student Bioinformatics
CMBI, Nijmegen

On 14 March 2011 12:31, Egon Willighagen <egon.willigha...@gmail.com> wrote:

> On Mon, Mar 14, 2011 at 12:03 PM, Vincent Le Guilloux
> <vincent.le-guill...@univ-orleans.fr> wrote:
> > When I use the reader to read a single IMolecule from a file (or a
> > String: same issue), the properties are missing when I call the
> > getProperties for this molecule.
> > Yet, if I ask the reader to read a ChemFile instead of reading a
> > molecule, the properties are read correctly.
>
> The MDLV2000Reader is not supposed to be used as iterating reader...
> reading a IMolecule will use this reader as a MDL molfile reader,
> which doesn't have properties, and it thus therefore not look for
> them.
>
> I think we should have IChemObjectReader's throw an EOF exception when
> the read() method is called for a second time...
>
> Egon
>
> --
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Institutet för miljömedicin
> Karolinska Institutet
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to