On Mon, Mar 14, 2011 at 12:55 PM, Jules Kerssemakers
<j.kerssemak...@cmbi.ru.nl> wrote:
> The problem seems to be that the MDLv2000Reader DOESN'T parse the molecule
> properties beyond the "M  END" line if asked for an IMolecule, but DOES
> parse them if asked for a IChemFile.

Correct. That's how it is supposed to happen. The read() method is not
expected to be called multiple times.

> I'm guessing the molecule-reading stops after encountering "M  END", which
> makes sense as this used to indicate the end of the molecule. Now with SDF
> properties occurring after this terminator-token, things go awry.

Yes. Didn't I reply that already? :)

The reason for this is that reading a IMolecule reads a MDL molfile,
which doesn't have any properties as part of the format.

> The IChemFileReader of course has to continue beyond the M-END, because
> there may be more molecules, in this process it "accidentally" encounters
> the properties, which it then (correctly) associates with the previous
> molecule.

Properties are only part of the MDL SD file specification, and thus
read when reading a SD file.

Actually, we already splitted the writing, and already discussed
splitting the MDL molfile reading from the MDL SD file reading into
two classes too. (Maybe this is already the case in master?)

> I'm not sure which behaviour is most correct.. On one hand SDF-files are
> primarily intended for multiple molecules, so using a
> single-molecule-reading method can be expected to trip. On the other hand,
> lots of people are using SDF as "MOL++", for single
> molecules-with-added-properties.
> In this case, I think it would be best to have the IMolecule-version reader
> check if there are lines beyond "M  END", and if so, process until it
> encounters "$$$$", or just stop at "M  END" if there's no more (non-empty)
> lines.

I think there is no need to change the code. The current library can
read those properties easily, if the library is used correctly (which
is clearly not intuitive here :).

Where the library can be improved, is in the error reporting, and a
better file 1-to-1 file-format-to-reader design.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to