Re: [Rdkit-discuss] Evaluating ETKDG method with the Platinum Dataset
To give some context, it's not that we're trying to sample a diverse set of conformers (and find something close to the experimental). In this case, we're generating initial geometries - to assess Naruki's fragment-based builder in Open Babel. But you raise an excellent point - by picking only one random conformer (as we're doing), we'll absolutely going to have a higher RMSD than sampling 50 conformers per compound and picking the best. We'll try a quick test to be safe, but thanks for the suggestion. -Geoff On Mon, Feb 25, 2019 at 3:55 PM Greg Landrum wrote: > > Hi Naruki, > > You're only generating a single conformer per molecule; I wouldn't expect > that to do particularly well. It's generally better to call > EmbedMultipleConfs(). > > As an aside: I've looked at the platinum set too, it might be worth checking > out this RDKit blog post: > http://rdkit.blogspot.com/2017/05/looking-at-platinum-dataset.html > > -greg > > > On Mon, Feb 25, 2019 at 11:53 AM Naruki Yoshikawa > wrote: >> >> Dear all, >> >> I'm evaluating ETKDG method implemented in RDKit using the Platinum >> Dataset introduced in a benchmark paper >> https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00505/ >> SMILES generated from the dataset is served as input and a 3D >> conformer is generated. >> We evaluate RMSD between generated structure and experimental structure. >> >> Although the author of the benchmark paper reported the mean RMSD to >> be below 1.0 angstrom, my evaluation code reports around 1.5 angstrom. >> I can't figure out why such a big difference occurs. >> >> My evaluation code is here: >> https://gist.github.com/n-yoshikawa/0ba04a1b0c718c4cc8d83702f3759afa >> There is a link to data in this gist. >> >> Thanks, >> Naruki >> >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Evaluating ETKDG method with the Platinum Dataset
Hi Naruki, You're only generating a single conformer per molecule; I wouldn't expect that to do particularly well. It's generally better to call EmbedMultipleConfs(). As an aside: I've looked at the platinum set too, it might be worth checking out this RDKit blog post: http://rdkit.blogspot.com/2017/05/looking-at-platinum-dataset.html -greg On Mon, Feb 25, 2019 at 11:53 AM Naruki Yoshikawa < naruki.yoshik...@gmail.com> wrote: > Dear all, > > I'm evaluating ETKDG method implemented in RDKit using the Platinum > Dataset introduced in a benchmark paper > https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00505/ > SMILES generated from the dataset is served as input and a 3D > conformer is generated. > We evaluate RMSD between generated structure and experimental structure. > > Although the author of the benchmark paper reported the mean RMSD to > be below 1.0 angstrom, my evaluation code reports around 1.5 angstrom. > I can't figure out why such a big difference occurs. > > My evaluation code is here: > https://gist.github.com/n-yoshikawa/0ba04a1b0c718c4cc8d83702f3759afa > There is a link to data in this gist. > > Thanks, > Naruki > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Evaluating ETKDG method with the Platinum Dataset
Dear all, I'm evaluating ETKDG method implemented in RDKit using the Platinum Dataset introduced in a benchmark paper https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00505/ SMILES generated from the dataset is served as input and a 3D conformer is generated. We evaluate RMSD between generated structure and experimental structure. Although the author of the benchmark paper reported the mean RMSD to be below 1.0 angstrom, my evaluation code reports around 1.5 angstrom. I can't figure out why such a big difference occurs. My evaluation code is here: https://gist.github.com/n-yoshikawa/0ba04a1b0c718c4cc8d83702f3759afa There is a link to data in this gist. Thanks, Naruki ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss