Hi Andrew,
What about building QSAR models to predict activity for a particular ChEMBL
assay? This would allow you to discuss strength and limitations of QSAR
models.
Best,
JW
___________________
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080
On Wed, Aug 29, 2018 at 7:24 AM <[email protected]>
wrote:
> Send Rdkit-discuss mailing list submissions to
> [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> [email protected]
>
> You can reach the person managing the list at
> [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
> 1. want advice for good teaching data set (Andrew Dalke)
> 2. Re: Capturing 3D Conformational Flexibility in a Single
> Descriptor (Richard Cooper)
> 3. Re: want advice for good teaching data set (TJ O'Donnell)
> 4. Re: Capturing 3D Conformational Flexibility in a Single
> Descriptor (Ali Eftekhari)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 29 Aug 2018 14:51:57 +0200
> From: Andrew Dalke <[email protected]>
> To: RDKit Discuss <[email protected]>
> Subject: [Rdkit-discuss] want advice for good teaching data set
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> Hi all,
>
> I am starting to put together materials for the Python/RDKit training
> course I'm giving just before the RDKit UGM next month.
>
> I would like to structure part of it around the SQLite release of the
> ChEMBL data set. More specifically, I plan to include examples of machine
> learning with scikit-learn, using RDKit descriptors and values from ChEMBL
> 24 (and making sure to use the new schema).
>
> Two problems. First, I'm not a computational chemist and I don't know what
> would constitute a good example to use. "Good" in this case means one whose
> outlines are well-known to likely students. Second, I don't have much
> experience with the ChEMBL data.
>
> My thought is to make a logP model. The easiest would be to based it on
> atom types. For this option, can anyone suggest where I can find logP data
> from ChEMBL?
>
> Another possibility is to use a pre-existing model, like the notebook
> George Papadatos did for Ligand-based Target Prediction at
> http://nbviewer.jupyter.org/gist/madgpap/10457778 .
>
> Perhaps someone here could point me to other existing resources along
> similar lines?
>
> Best regards,
>
> Andrew
> [email protected]
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 29 Aug 2018 14:32:28 +0100
> From: Richard Cooper <[email protected]>
> To: Ali Eftekhari <[email protected]>
> Cc: RDKit Discuss <[email protected]>
> Subject: Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility
> in a Single Descriptor
> Message-ID:
> <
> cajwsdrteawmtnqrhzfnfojj54orgtsgj+-_6rwly26o98as...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just to follow up with the details - here is the line in the script to
> change:
>
> conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3)
>
> to
>
> conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3, randomSeed=737 )
>
> (where 737 is an integer constant of your choice, but not -1).
>
> Richard
>
>
> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
> [email protected]> wrote:
> >
> > Hi Ali,
> >
> > Sorry I missed your email.
> >
> > The behaviour you describe is correct, due to a random seed in the
> conformer generation step. The descriptor value usually doesn't vary by too
> much.
> >
> > I think you can give the conformer generation a constant random seed if
> you need a reproducible number for nConf20.
> >
> > Regards, Richard
> >
> >
> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, <[email protected]>
> wrote:
> >>
> >> Hello all,
> >>
> >> I am trying to calculate 3D Descriptors following this publication:
> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper. J.
> Chem. Inf. Model. 2016, 56, 2347?2352
> >>
> >> I am essentially using the same script as they have in the supporting
> information and i have attached it here as well. In Table 2 from the above
> calculation, the value of the descriptor (nConf20) for ZINC000290539224
> molecule is listed as 10. However, when I run the exact code as the one
> they used, I get different value at each run.
> >>
> >> I have already contacted the authors but got no response. I am
> wondering if the code they have in the supporting information is not right
> or the value they listed in the table is wrong?
> >>
> >> The SMILES string for this particular molecule is:
> >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O'
> >>
> >> Thanks in advance for your help!
> >>
>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 3
> Date: Wed, 29 Aug 2018 06:51:45 -0700
> From: "TJ O'Donnell" <[email protected]>
> To: Andrew Dalke <[email protected]>
> Cc: RDKit Discuss <[email protected]>
> Subject: Re: [Rdkit-discuss] want advice for good teaching data set
> Message-ID:
> <
> cadqa_h8xm5difzy7zf_zapphgulb5+uhuavsahuc0vvewdm...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Andrew
> ChEMBL 24 has compound properties in the table compound_properties. I
> think the alogp
> is computed using (Crippen) atom types and the acd_logp is uses ACD labs
> methods.
> TJ
>
> On Wed, Aug 29, 2018 at 5:52 AM Andrew Dalke <[email protected]>
> wrote:
>
> > Hi all,
> >
> > I am starting to put together materials for the Python/RDKit training
> > course I'm giving just before the RDKit UGM next month.
> >
> > I would like to structure part of it around the SQLite release of the
> > ChEMBL data set. More specifically, I plan to include examples of machine
> > learning with scikit-learn, using RDKit descriptors and values from
> ChEMBL
> > 24 (and making sure to use the new schema).
> >
> > Two problems. First, I'm not a computational chemist and I don't know
> what
> > would constitute a good example to use. "Good" in this case means one
> whose
> > outlines are well-known to likely students. Second, I don't have much
> > experience with the ChEMBL data.
> >
> > My thought is to make a logP model. The easiest would be to based it on
> > atom types. For this option, can anyone suggest where I can find logP
> data
> > from ChEMBL?
> >
> > Another possibility is to use a pre-existing model, like the notebook
> > George Papadatos did for Ligand-based Target Prediction at
> > http://nbviewer.jupyter.org/gist/madgpap/10457778 .
> >
> > Perhaps someone here could point me to other existing resources along
> > similar lines?
> >
> > Best regards,
> >
> > Andrew
> > [email protected]
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________
> > Rdkit-discuss mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 4
> Date: Wed, 29 Aug 2018 07:24:05 -0700
> From: Ali Eftekhari <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility
> in a Single Descriptor
> Message-ID:
> <CAKWSw4=
> [email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Dr. Cooper,
>
> Thanks for your response and the suggestions. I added randomSeed=737 and I
> now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule
> (although it is different than your paper [the value is 10] it does not
> change on each run). My concern now is on the general usage of nConf20
> descriptor. For instance, is there a limitation on what molecules can be
> used for estimating their nConf20? Since the conformers are generated
> randomly, how reliable is this descriptor to use it as a replacement for
> Rotatable Bond Count (RBC) in all machine learning models.
>
> In my application, the calculated values of RBC for 350 molecules range
> from 0 to 7 with (80% between 0-4 and 20% between 5-7). The calculated
> values of nconf20 is between 0-40 but with 95% between 0-3. Since nConf20
> for majority of molecules is between 0-3, I am concerned on the usage of
> nconf20 as the main descriptor. Could you please comment on that?
>
> Thanks,
> Ali
>
> On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper <
> [email protected]> wrote:
>
> >
> > Just to follow up with the details - here is the line in the script to
> > change:
> >
> > conformers = AllChem.EmbedMultipleConfs
> > (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3)
> >
> > to
> >
> > conformers = AllChem.EmbedMultipleConfs
> > (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3, randomSeed=737 )
> >
> > (where 737 is an integer constant of your choice, but not -1).
> >
> > Richard
> >
> >
> > On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
> > [email protected]> wrote:
> > >
> > > Hi Ali,
> > >
> > > Sorry I missed your email.
> > >
> > > The behaviour you describe is correct, due to a random seed in the
> > conformer generation step. The descriptor value usually doesn't vary by
> too
> > much.
> > >
> > > I think you can give the conformer generation a constant random seed if
> > you need a reproducible number for nConf20.
> > >
> > > Regards, Richard
> > >
> > >
> > > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, <[email protected]>
> > wrote:
> > >>
> > >> Hello all,
> > >>
> > >> I am trying to calculate 3D Descriptors following this publication:
> > >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
> > in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper. J.
> > Chem. Inf. Model. 2016, 56, 2347?2352
> > >>
> > >> I am essentially using the same script as they have in the supporting
> > information and i have attached it here as well. In Table 2 from the
> above
> > calculation, the value of the descriptor (nConf20) for ZINC000290539224
> > molecule is listed as 10. However, when I run the exact code as the one
> > they used, I get different value at each run.
> > >>
> > >> I have already contacted the authors but got no response. I am
> > wondering if the code they have in the supporting information is not
> right
> > or the value they listed in the table is wrong?
> > >>
> > >> The SMILES string for this particular molecule is:
> > >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O'
> > >>
> > >> Thanks in advance for your help!
> > >>
> >
> >>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------
>
> End of Rdkit-discuss Digest, Vol 130, Issue 35
> **********************************************
>
--
CONFIDENTIALITY NOTICE: This email message and any attachments are intended
solely for the addressee(s) and may contain confidential information or may
be legally protected from disclosure. If you are not the intended recipient
of this message, or if this message has been addressed to you in error,
please immediately alert the sender by reply email and then delete this
message and any attachments.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss