Re: [Rdkit-discuss] Evaluating ETKDG method with the Platinum Dataset

2019-02-25 Thread Geoffrey Hutchison
To give some context, it's not that we're trying to sample a diverse
set of conformers (and find something close to the experimental). In
this case, we're generating initial geometries - to assess Naruki's
fragment-based builder in Open Babel.

But you raise an excellent point - by picking only one random
conformer (as we're doing), we'll absolutely going to have a higher
RMSD than sampling 50 conformers per compound and picking the best.

We'll try a quick test to be safe, but thanks for the suggestion.

-Geoff

On Mon, Feb 25, 2019 at 3:55 PM Greg Landrum  wrote:
>
> Hi Naruki,
>
> You're only generating a single conformer per molecule; I wouldn't expect 
> that to do particularly well. It's generally better to call 
> EmbedMultipleConfs().
>
> As an aside: I've looked at the platinum set too, it might be worth checking 
> out this RDKit blog post: 
> http://rdkit.blogspot.com/2017/05/looking-at-platinum-dataset.html
>
> -greg
>
>
> On Mon, Feb 25, 2019 at 11:53 AM Naruki Yoshikawa 
>  wrote:
>>
>> Dear all,
>>
>> I'm evaluating ETKDG method implemented in RDKit using the Platinum
>> Dataset introduced in a benchmark paper
>> https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00505/
>> SMILES generated from the dataset is served as input and a 3D
>> conformer is generated.
>> We evaluate RMSD between generated structure and experimental structure.
>>
>> Although the author of the benchmark paper reported the mean RMSD to
>> be below 1.0 angstrom, my evaluation code reports around 1.5 angstrom.
>> I can't figure out why such a big difference occurs.
>>
>> My evaluation code is here:
>> https://gist.github.com/n-yoshikawa/0ba04a1b0c718c4cc8d83702f3759afa
>> There is a link to data in this gist.
>>
>> Thanks,
>> Naruki
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Evaluating ETKDG method with the Platinum Dataset

2019-02-25 Thread Greg Landrum
Hi Naruki,

You're only generating a single conformer per molecule; I wouldn't expect
that to do particularly well. It's generally better to call
EmbedMultipleConfs().

As an aside: I've looked at the platinum set too, it might be worth
checking out this RDKit blog post:
http://rdkit.blogspot.com/2017/05/looking-at-platinum-dataset.html

-greg


On Mon, Feb 25, 2019 at 11:53 AM Naruki Yoshikawa <
naruki.yoshik...@gmail.com> wrote:

> Dear all,
>
> I'm evaluating ETKDG method implemented in RDKit using the Platinum
> Dataset introduced in a benchmark paper
> https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00505/
> SMILES generated from the dataset is served as input and a 3D
> conformer is generated.
> We evaluate RMSD between generated structure and experimental structure.
>
> Although the author of the benchmark paper reported the mean RMSD to
> be below 1.0 angstrom, my evaluation code reports around 1.5 angstrom.
> I can't figure out why such a big difference occurs.
>
> My evaluation code is here:
> https://gist.github.com/n-yoshikawa/0ba04a1b0c718c4cc8d83702f3759afa
> There is a link to data in this gist.
>
> Thanks,
> Naruki
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Evaluating ETKDG method with the Platinum Dataset

2019-02-25 Thread Naruki Yoshikawa
Dear all,

I'm evaluating ETKDG method implemented in RDKit using the Platinum
Dataset introduced in a benchmark paper
https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00505/
SMILES generated from the dataset is served as input and a 3D
conformer is generated.
We evaluate RMSD between generated structure and experimental structure.

Although the author of the benchmark paper reported the mean RMSD to
be below 1.0 angstrom, my evaluation code reports around 1.5 angstrom.
I can't figure out why such a big difference occurs.

My evaluation code is here:
https://gist.github.com/n-yoshikawa/0ba04a1b0c718c4cc8d83702f3759afa
There is a link to data in this gist.

Thanks,
Naruki


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss