On Mon, Mar 14, 2011 at 12:55 PM, Jules Kerssemakers <j.kerssemak...@cmbi.ru.nl> wrote: > The problem seems to be that the MDLv2000Reader DOESN'T parse the molecule > properties beyond the "M END" line if asked for an IMolecule, but DOES > parse them if asked for a IChemFile.
Correct. That's how it is supposed to happen. The read() method is not expected to be called multiple times. > I'm guessing the molecule-reading stops after encountering "M END", which > makes sense as this used to indicate the end of the molecule. Now with SDF > properties occurring after this terminator-token, things go awry. Yes. Didn't I reply that already? :) The reason for this is that reading a IMolecule reads a MDL molfile, which doesn't have any properties as part of the format. > The IChemFileReader of course has to continue beyond the M-END, because > there may be more molecules, in this process it "accidentally" encounters > the properties, which it then (correctly) associates with the previous > molecule. Properties are only part of the MDL SD file specification, and thus read when reading a SD file. Actually, we already splitted the writing, and already discussed splitting the MDL molfile reading from the MDL SD file reading into two classes too. (Maybe this is already the case in master?) > I'm not sure which behaviour is most correct.. On one hand SDF-files are > primarily intended for multiple molecules, so using a > single-molecule-reading method can be expected to trip. On the other hand, > lots of people are using SDF as "MOL++", for single > molecules-with-added-properties. > In this case, I think it would be best to have the IMolecule-version reader > check if there are lines beyond "M END", and if so, process until it > encounters "$$$$", or just stop at "M END" if there's no more (non-empty) > lines. I think there is no need to change the code. The current library can read those properties easily, if the library is used correctly (which is clearly not intuitive here :). Where the library can be improved, is in the error reporting, and a better file 1-to-1 file-format-to-reader design. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user