[Rdkit-discuss] Google Research Job Opening - Drug Discovery Scientist

2021-06-18 Thread JW Feng
Hi RDKit community,

We are looking for a drug discovery scientist with computational and
machine learning expertise [job link
<https://careers.google.com/jobs/results/91113763815465670-research-scientist-drug-discovery/>].
At Google, you will have access to massive amounts of data and compute
resources to train ML models for multi-parameter optimization. Model
performance will be rigorously evaluated through prospective testing of
molecules in a wide range of drug discovery assays. You will also have
access to the latest ML innovations from the broader Google Research team.

Responsibilities:
Lead hit-to-lead and lead optimization projects with hands-on analysis and
modeling
Develop novel algorithms and models for molecular property predictions
Collaborate with industry partners in supporting drug discovery programs
Participate in cutting edge research in machine learning

Minimum qualifications:
PhD degree in Computer Science, Chemistry, Biology, a related field, or
equivalent practical experience
2 years of experience supporting pharmaceutical industry medicinal
chemistry programs
Experience with cheminformatics, including experience with at least one
common toolkit such as RDKit or OEChem
Software development experience in one or more general purpose programming
languages

Preferred qualifications:
Developed machine learning models for hit-to-lead or lead optimization

Come join me and an excellent team of drug hunters, software engineers, and
ML scientists to accelerate drug discovery.  Please visit the job post
<https://careers.google.com/jobs/results/91113763815465670-research-scientist-drug-discovery/>
and apply.

Best,

JW Feng (LinkedIn <https://www.linkedin.com/in/jwfeng/>)
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Availability of new command line Psi4 scripts powered by RDKit

2021-04-21 Thread JW Feng
Hi Manish,

This is great. I used Psi4 extensively for torsional scans while at Denali.
Do you plan to include scripts for torsional scans?

Best,

JW

On Wed, Apr 21, 2021 at 9:11 AM Manish Sud  wrote:

> Hi All,
>
>
>
> I'll like to share with you the availability of the following new command
> line Python scripts based on Psi4:
>
>
>
> o Psi4CalculateEnergy.py
>
> o Psi4CalculatePartialCharges.py
>
> o Psi4CalculateProperties.py
>
> o Psi4GenerateConformers.py
>
> o Psi4PerformMinimization.py
>
> o Psi4VisualizeDualDescriptors.py
>
> o Psi4VisualizeElectrostaticPotential.py
>
> o Psi4VisualizeFrontierOrbitals.py
>
>
>
> These scripts rely on the availability of Psi4 and RDKit in your
> environment. The RDKit is used for a variety of tasks including reading and
> writing molecules, generating initial 3D coordinates and conformers of
> molecules, and removing similar conformers. In addition, multiprocessing
> functionality is available across all the scripts.
>
>
>
> Some of you might find these scripts useful. Please visit
> www.MayaChemTools.org for further details.
>
>
>
> Your feedback is welcome.
>
>
>
> Thanks,
>
> Manish
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and GSoC 2020

2020-03-19 Thread JW Feng
iwatobipen blog was where I found instructions for installing RDKit on
Colab.  It works but I found waiting for miniconda to install to be too
annoying. A one line apt-get command to install RDKit is easier and faster
 (~10 seconds) but it only works with Python 2.  Running following command
in a Python 3 environment results in the error below. Getting apt-get to
install RDKit correctly for Python 3 is a good solution.

!apt-get install python-rdkit librdkit1 rdkit-data
from rdkit import Chem
...

---

ModuleNotFoundError   Traceback (most recent call last)


<https://61qi2f4hjw5-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20200318-085600-RC00_301599018#>
in ()  1 get_ipython().system('apt-get install
python-rdkit librdkit1 rdkit-data')> 2 from rdkit import Chem


ModuleNotFoundError: No module named 'rdkit'

---



Best,

JW



On Mon, Mar 16, 2020 at 4:48 AM Taka Seri  wrote:

> Dear Steve, Greg and All,
>
> Recently I moved from clab to Binder to make cloud env with python.
> However I'll try to make my code more compact and share it.
>  Thanks for following my blog post. ;) https://iwatobipen.wordpress.com/
>
> Best regards,
>
> Taka (tiwtter account / iwatobipen)
>
> 2020年3月16日(月) 16:03 Greg Landrum :
>
>> Thanks Steve,
>>
>> That's really helpful. Given that we're unlikely to end up with a decent
>> pip-installable RDkit, I guess the snippet approach would be the best way
>> to go. I will try to make some time for this (or convince iwatobipen to do
>> it) in the reasonably near future.
>>
>> Best,
>> -greg
>>
>> On Sun, Mar 15, 2020 at 5:58 PM Steven Kearnes 
>> wrote:
>>
>>> re: rdkit+colab
>>>
>>> In talking with folks outside of Google about rdkit+colab, I haven't
>>> been able to establish that it's worth the trouble of making rdkit a
>>> default dependency. It seems that a rather compact incantation
>>> <https://iwatobipen.wordpress.com/2018/11/01/run-rdkit-and-deep-learning-on-google-colab-rdkit/>
>>> does the job fairly well. This could be compressed even further, or even
>>> turned into a colab snippet <https://stackoverflow.com/a/53875826> for
>>> easier use.
>>>
>>> Also, since colab doesn't play well with conda (as far as pre-installed
>>> deps are concerned), we would at least need a pip-installable rdkit to
>>> consider making this work.
>>>
>>> Thanks,
>>> Steve
>>>
>>> On Mon, Mar 9, 2020 at 4:43 PM JW Feng  wrote:
>>>
>>>> Are you sure depictions in GSheet wouldn't be a good GSoC project?  I
>>>> will ask around to find volunteers to connect with you on GSheets and
>>>> Colab.
>>>>
>>>> On Fri, Mar 6, 2020 at 8:14 PM Greg Landrum 
>>>> wrote:
>>>>
>>>>> Hi JW,
>>>>>
>>>>> I don't think it's a great GSoC project for a couple of reasons, but
>>>>> I'd love to have RDKit integration in Google Sheets and am willing to do
>>>>> some work to make that happen. I can poke around a bit to see about how we
>>>>> could use the new RDKit-JS wrappers, but having access to someone with
>>>>> experience writing Sheets add-ins would help. If you know someone
>>>>> internally meeting that description, please put them in touch with me.
>>>>>
>>>>> I think making the code easily available in Colab can only be done by
>>>>> someone inside google. I'm happy to help however I can with that if you 
>>>>> (or
>>>>> anyone else) can identify the right person.
>>>>>
>>>>> Best,
>>>>> -greg
>>>>>
>>>>> On Sat, Mar 7, 2020 at 2:22 AM JW Feng  wrote:
>>>>>
>>>>>> Project suggestion:
>>>>>>
>>>>>> Project 1:
>>>>>> Implement 2D structure depiction in Google Spreadsheets.  My
>>>>>> colleagues at Google think this is very doable.  Being able to depict
>>>>>> structures in Google Spreadsheets will dramatically increase 
>>>>>> collaboration
>>>>>> between scientists.  Imaging being able to provide comments for a
>>>>>> structure, design idea, or virtual screening hit in a live Google
>>>>>> Spreadsheet.  While there are commercial (Vortex, Spotfire, MarvinView,
>>>>

Re: [Rdkit-discuss] The RDKit and GSoC 2020

2020-03-09 Thread JW Feng
Are you sure depictions in GSheet wouldn't be a good GSoC project?  I will
ask around to find volunteers to connect with you on GSheets and Colab.

On Fri, Mar 6, 2020 at 8:14 PM Greg Landrum  wrote:

> Hi JW,
>
> I don't think it's a great GSoC project for a couple of reasons, but I'd
> love to have RDKit integration in Google Sheets and am willing to do some
> work to make that happen. I can poke around a bit to see about how we could
> use the new RDKit-JS wrappers, but having access to someone with experience
> writing Sheets add-ins would help. If you know someone internally meeting
> that description, please put them in touch with me.
>
> I think making the code easily available in Colab can only be done by
> someone inside google. I'm happy to help however I can with that if you (or
> anyone else) can identify the right person.
>
> Best,
> -greg
>
> On Sat, Mar 7, 2020 at 2:22 AM JW Feng  wrote:
>
>> Project suggestion:
>>
>> Project 1:
>> Implement 2D structure depiction in Google Spreadsheets.  My colleagues
>> at Google think this is very doable.  Being able to depict structures in
>> Google Spreadsheets will dramatically increase collaboration between
>> scientists.  Imaging being able to provide comments for a structure, design
>> idea, or virtual screening hit in a live Google Spreadsheet.  While there
>> are commercial (Vortex, Spotfire, MarvinView, Stardrop ...) and open source
>> (Datawarrior) packages that can read CSV files containing smiles and depict
>> structures, none comes close to GSheets for collaboration and ease of use.
>>
>>- Cells in columns named SMILES, or have SMILES as a substring in the
>>header, will be depicted in 2D using RDKit
>>- Cells with depicted structures move with other columns when
>>sorting, filtering, etc.
>>- Optional: depictions update when SMILES string is edited
>>- Bonus: calculate properties using formulas.  Ex: Descriptors.MolWt(A1)
>>calculates MW of SMILES in A1
>>
>> Project 2:
>>
>>- Make it easy to use RDKit in Google Colab
>><https://colab.sandbox.google.com/notebooks/intro.ipynb#recent=true>
>>- No need to install RDKit, from rdkit import Chem just works out of
>>the box
>>
>> Best,
>>
>> JW
>> On Sun, Feb 23, 2020 at 11:48 PM Greg Landrum 
>> wrote:
>>
>>> Dear all,
>>>
>>> I'm happy to share that the RDKit will once again be part of Google
>>> Summer of Code in 2020. This is a program where Google funds students to
>>> work on open-source projects for a couple of months over the summer. We've
>>> participated in each of the last three years and had some cool stuff come
>>> out of it.
>>>
>>> We're looking for a few more project ideas (along with possible
>>> mentors!) as well as students.
>>> Applications start in the middle of March. There's more info about
>>> timelines here:
>>> https://developers.google.com/open-source/gsoc/timeline
>>>
>>> The current set of project ideas is here and we could use a few more:
>>> http://wiki.openchemistry.org/GSoC_Ideas_2020#RDKit_Project_Ideas
>>> I'm going to try and come up with something, but if you have something
>>> to add, please let me know.
>>>
>>> Best,
>>> -greg
>>>
>>>
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and GSoC 2020

2020-03-06 Thread JW Feng
Project suggestion:

Project 1:
Implement 2D structure depiction in Google Spreadsheets.  My colleagues at
Google think this is very doable.  Being able to depict structures in
Google Spreadsheets will dramatically increase collaboration between
scientists.  Imaging being able to provide comments for a structure, design
idea, or virtual screening hit in a live Google Spreadsheet.  While there
are commercial (Vortex, Spotfire, MarvinView, Stardrop ...) and open source
(Datawarrior) packages that can read CSV files containing smiles and depict
structures, none comes close to GSheets for collaboration and ease of use.

   - Cells in columns named SMILES, or have SMILES as a substring in the
   header, will be depicted in 2D using RDKit
   - Cells with depicted structures move with other columns when sorting,
   filtering, etc.
   - Optional: depictions update when SMILES string is edited
   - Bonus: calculate properties using formulas.  Ex:
Descriptors.MolWt(A1) calculates
   MW of SMILES in A1

Project 2:

   - Make it easy to use RDKit in Google Colab
   
   - No need to install RDKit, from rdkit import Chem just works out of the
   box

Best,

JW
On Sun, Feb 23, 2020 at 11:48 PM Greg Landrum 
wrote:

> Dear all,
>
> I'm happy to share that the RDKit will once again be part of Google Summer
> of Code in 2020. This is a program where Google funds students to work on
> open-source projects for a couple of months over the summer. We've
> participated in each of the last three years and had some cool stuff come
> out of it.
>
> We're looking for a few more project ideas (along with possible mentors!)
> as well as students.
> Applications start in the middle of March. There's more info about
> timelines here:
> https://developers.google.com/open-source/gsoc/timeline
>
> The current set of project ideas is here and we could use a few more:
> http://wiki.openchemistry.org/GSoC_Ideas_2020#RDKit_Project_Ideas
> I'm going to try and come up with something, but if you have something to
> add, please let me know.
>
> Best,
> -greg
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] want advice for good teaching data set

2018-08-29 Thread JW Feng via Rdkit-discuss
Hi Andrew,

What about building QSAR models to predict activity for a particular ChEMBL
assay?  This would allow you to discuss strength and limitations of QSAR
models.

Best,

JW
___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080


On Wed, Aug 29, 2018 at 7:24 AM 
wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. want advice for good teaching data set (Andrew Dalke)
>2. Re: Capturing 3D Conformational Flexibility in a Single
>   Descriptor (Richard Cooper)
>3. Re: want advice for good teaching data set (TJ O'Donnell)
>4. Re: Capturing 3D Conformational Flexibility in a Single
>   Descriptor (Ali Eftekhari)
>
>
> --
>
> Message: 1
> Date: Wed, 29 Aug 2018 14:51:57 +0200
> From: Andrew Dalke 
> To: RDKit Discuss 
> Subject: [Rdkit-discuss] want advice for good teaching data set
> Message-ID: <8625305a-6b76-4721-bdbf-297f23edc...@dalkescientific.com>
> Content-Type: text/plain; charset=us-ascii
>
> Hi all,
>
>   I am starting to put together materials for the Python/RDKit training
> course I'm giving just before the RDKit UGM next month.
>
> I would like to structure part of it around the SQLite release of the
> ChEMBL data set. More specifically, I plan to include examples of machine
> learning with scikit-learn, using RDKit descriptors and values from ChEMBL
> 24 (and making sure to use the new schema).
>
> Two problems. First, I'm not a computational chemist and I don't know what
> would constitute a good example to use. "Good" in this case means one whose
> outlines are well-known to likely students. Second, I don't have much
> experience with the ChEMBL data.
>
> My thought is to make a logP model. The easiest would be to based it on
> atom types. For this option, can anyone suggest where I can find logP data
> from ChEMBL?
>
> Another possibility is to use a pre-existing model, like the notebook
> George Papadatos did for Ligand-based Target Prediction at
> http://nbviewer.jupyter.org/gist/madgpap/10457778 .
>
> Perhaps someone here could point me to other existing resources along
> similar lines?
>
> Best regards,
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
>
> --
>
> Message: 2
> Date: Wed, 29 Aug 2018 14:32:28 +0100
> From: Richard Cooper 
> To: Ali Eftekhari 
> Cc: RDKit Discuss 
> Subject: Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility
> in a Single Descriptor
> Message-ID:
> <
> cajwsdrteawmtnqrhzfnfojj54orgtsgj+-_6rwly26o98as...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just to follow up with the details - here is the line in the script to
> change:
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3)
>
> to
>
>conformers = AllChem.EmbedMultipleConfs
> (molecule,numConfs,pruneRmsThresh=0.5,  numThreads =3,  randomSeed=737 )
>
> (where 737 is an integer constant of your choice, but not -1).
>
> Richard
>
>
> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper <
> richardiancooper+rdkitdisc...@gmail.com> wrote:
> >
> > Hi Ali,
> >
> > Sorry I missed your email.
> >
> > The behaviour you describe is correct, due to a random seed in the
> conformer generation step. The descriptor value usually doesn't vary by too
> much.
> >
> > I think you can give the conformer generation a constant random seed if
> you need a reproducible number for nConf20.
> >
> > Regards, Richard
> >
> >
> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, 
> wrote:
> >>
> >> Hello all,
> >>
> >> I am trying to calculate 3D Descriptors following this publication:
> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility
> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper.  J.
> Chem. Inf. Model. 2016, 5

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 124, Issue 10

2018-02-07 Thread JW Feng via Rdkit-discuss
How about setting up a donation fund on rdkit.org to pay for summer
students to document code?  For companies that benefited from using RDKit,
it is a worthy cause to pay it forward.

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Feb 7, 2018 at 12:24 PM, <
rdkit-discuss-requ...@lists.sourceforge.net> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: RDKit and Google Summer of Code 2018 (Greg Landrum)
>
>
> --
>
> Message: 1
> Date: Wed, 7 Feb 2018 21:23:46 +0100
> From: Greg Landrum <greg.land...@gmail.com>
> To: Cameron Pye <cameron@gmail.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
> Message-ID:
> <CAD4fdRT4e+gp0Hsi6KojPQfmirwx6JO3P5QVrbhD4XrXJG=
> o...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> A quick one on this as part of me digging out from the pile of email I
> should have replied to.
>
> Cameron's suggestion is a really good one, but unfortunately GSoC is really
> about coding projects, so it doesn't work here.
>
> But we should still talk about ways to improve the docs.
>
> I agree that this is a really important task but it's also a bit
> overwhelming and difficult to know where to start. This is too bad since
> it's something you don't need to be a coder to approach; more or less any
> RDKit user could contribute. Believe it or not, just having people point
> out pieces of code that could be (better) documented is already useful -
> I'm sure I'm not the only developer who has forgotten which bits of code
> they've left un(der)documented. I often have 10-15 minute slots of time
> that I could use for writing docs, but it really helps to know which pieces
> should be done first.
>
> I would love to hear suggestions for ways that we can make it easier for
> people to submit improved documentation or pointers to pieces of code that
> could use better documentation and then to let people know that these
> options exist. It needs to be something other than "send email to the list"
> though.
>
> It's currently pretty easy to submit bug reports/feature requests using the
> github interface. These could either provide suggested docs/doc changes or
> point to functions/methods/classes that could be better documented. The
> github guys just added the ability to specify different types of issue
> templates, I could look into doing one of these for documentation requests.
>
> -greg
>
>
>
> On Wed, Jan 24, 2018 at 7:38 PM, Cameron Pye <cameron@gmail.com>
> wrote:
>
> >  I know this isn't a particularly sexy job for a budding
> cheminformatician
> > but...
> >
> > Work on the Python documentation!!!
> >
> > I love rdKit and occasionally think I'm pretty savvy but I can't tell you
> > how often I'm scrolling through the documentation (or source) and either:
> >
> > a) discover something that exists but doesn't have anything documentation
> > but the function signature
> > or
> > b) discover some some functionality that exists (and i've wanted) but
> > didn't know it was there!
> >
> > I think this mailing list and Greg do a superb job of keeping the
> > community informed and creating and maintaining the codebase but I think
> > having some more "Pythonic" API documentation would be great.
> >
> > One shining example is the scikit-learn documentation
> > <http://scikit-learn.org/stable/documentation.html> that has a quick
> > start, tutorials etc.  and then in the well categorized and explanatory
> API
> > ref has links for examples in the User Guide (akin to the "Getting
> Started
> > with the RDKit in Python" doc) .
> >
> > Just my 2 cents!
> >
> > Thanks for all the hard work as always,
> > Cam
> >
> >
> > On Mon, Jan 15, 2018 at 12:52 PM <rdkit-discuss-request@lists.
> &

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 123, Issue 26

2018-01-16 Thread JW Feng via Rdkit-discuss
Another +1 for MolVS.

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Tue, Jan 16, 2018 at 10:04 AM, <
rdkit-discuss-requ...@lists.sourceforge.net> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: RDKit and Google Summer of Code 2018 (Brian Cole)
>2. Re: RDKit and Google Summer of Code 2018 (JP)
>3. Re: RDKit and Google Summer of Code 2018 (George Papadatos)
>
>
> --
>
> Message: 1
> Date: Tue, 16 Jan 2018 10:00:00 -0500
> From: Brian Cole <col...@gmail.com>
> To: Francois BERENGER <beren...@bioreg.kyushu-u.ac.jp>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018
> Message-ID:
> <CAB0HroNDm9SiEC_kfuUMMmH83O+=EzNqZowzCpn6KFjsHQ6HMw@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> +1 to the MolVS project as well.
>
> Perhaps an easy bite-size project is to incorporate the open source mae
> parser code into core RDKit: https://github.com/schrodinger/maeparser
>
>
> On Mon, Jan 15, 2018 at 9:08 PM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
>
> > On 01/16/2018 05:51 AM, Tim Dudgeon wrote:
> > > Incorporating and "industrialising" Matt's MolVS tautomer and
> > > standardizer code?
> > > http://molvs.readthedocs.io/en/latest/index.html
> >
> > If we can vote, I would vote for this one.
> >
> > > On 15/01/18 07:09, Greg Landrum wrote:
> > >> Dear all,
> > >>
> > >> We've been invited again to participate in the OpenChemistry
> > >> application for Google Summer of Code.
> > >>
> > >> In order to participate we need ideas for projects and mentors to go
> > >> along with them.
> > >>
> > >> The current list of RDKit ideas is being maintained here:
> > >> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas
> > >>
> > >> (Note: at the point that I'm pressing "send", that's still a copy of
> > >> last year's project ideas).
> > >>
> > >> If you're willing to be a mentor (please ask me about the ~5
> > >> hours/week required here) or have ideas, please reply to this thread.
> > >>
> > >> Best,
> > >> -greg
> > >>
> > >>
> > >> 
> > --
> > >> Check out the vibrant tech community on one of the world's most
> > >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > >>
> > >>
> > >> ___
> > >> Rdkit-discuss mailing list
> > >> Rdkit-discuss@lists.sourceforge.net
> > >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> > >
> > >
> > >
> > > 
> > --
> > > Check out the vibrant tech community on one of the world's most
> > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > >
> > >
> > >
> > > ___
> > > Rdkit-discuss mailing list
> > > Rdkit-discuss@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> > >
> >
> > 
> > --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> -- next part --

Re: [Rdkit-discuss] Seg fault importing rdkit.Chem on Mac 10.13.2 and Python 3.6.3

2018-01-04 Thread JW Feng via Rdkit-discuss
Thanks, my colleague Katrina Lexa found that python 3.6.1 worked.  Conda
version is 4.4.6

conda create --name test-rdkit --channel https://conda.anaconda.org/rdkit rdkit
python=3.6.1

Best,

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Jan 3, 2018 at 11:55 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

> I'm going to guess that it's this problem: https://github.com/rd
> kit/rdkit/issues/1617
> and that the solution is to downgrade conda to v4.3.25 (conda install
> conda=4.3.25).
>
> This problem has proven much more frustrating to fix for the mac (linux
> and windows are now fine) than expected, but Brian and I continue to try.
>
> -greg
>
>
> On Tue, Jan 2, 2018 at 9:46 PM, JW Feng via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> Hi,
>>
>> I want to check to see if others encountered this problem before filing a
>> new issue on github.  I got a seg fault trying to import rdkit.Chem.  I am
>> using Python 3.6.3 on Mac OS 10.13.2 (High Sierra).  Below is a screenshot
>> showing how I reproduced the seg fault error.  RDKit was installed using
>> this conda command "conda install --channel
>> https://conda.anaconda.org/rdkit rdkit"
>>
>>
>> [image: Inline image 1]
>>
>> Python 2.7 works just fine.
>>
>> Thanks,
>>
>> JW
>>
>>
>> ___
>> JW Feng, Ph.D.
>> Denali Therapeutics Inc.
>> 151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080
>> <https://maps.google.com/?q=151+Oyster+Point+Blvd,+2nd+Floor,+South+San+Francisco,+CA+94080=gmail=g>
>>  |
>> (650) 270-0628
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Seg fault importing rdkit.Chem on Mac 10.13.2 and Python 3.6.3

2018-01-02 Thread JW Feng via Rdkit-discuss
Hi,

I want to check to see if others encountered this problem before filing a
new issue on github.  I got a seg fault trying to import rdkit.Chem.  I am
using Python 3.6.3 on Mac OS 10.13.2 (High Sierra).  Below is a screenshot
showing how I reproduced the seg fault error.  RDKit was installed using
this conda command "conda install --channel
https://conda.anaconda.org/rdkit rdkit"


[image: Inline image 1]

Python 2.7 works just fine.

Thanks,

JW


_______
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 121, Issue 15

2017-11-08 Thread JW Feng via Rdkit-discuss
The Daylight website is a very good resource for SMILES, SMARTS, and
SMIRKS.

http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Wed, Nov 8, 2017 at 2:52 PM, <rdkit-discuss-requ...@lists.sourceforge.net
> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. SMARTS for =C=, #CH, #C- (Chenyang Shi)
>2. Re: SMARTS for =C=, #CH, #C- (Andrew Dalke)
>3. Re: SMARTS for =C=, #CH, #C- (Chenyang Shi)
>4. SMARTS for Joback and Reid method (Chenyang Shi)
>
>
> --
>
> Message: 1
> Date: Wed, 8 Nov 2017 14:00:36 -0600
> From: Chenyang Shi <cs3...@columbia.edu>
> To: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID:
> <CAAj+Mte+mqgznFqFfeLVgL06ZJDbk0pX-uTLpGBk_n_jiWKqgg@mail.gmail.
> com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear RDKitters,
>
> I have a question regarding SMARTS codes for three simple functional
> groups, these are =C=, #CH and #C-. I am new to SMARTS/SMILES. I indeed
> tried to guess their codes. Here are my guesses:
>
> =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> #CH : [CH1;A;X2;!R]#[$(*)]
>
> #C- :  [CH0;A;X2;!R]#[$(*)]
>
> I checked these SMARTS at
> http://smartsview.zbh.uni-hamburg.de/smartsview/calculate?method=get; they
> all seem make sense.
>
> For example, the webpage prints out following messages:
>
> =C=: it says "aliphatic C with 0 further total connections, with 0 further
> hydrogen, not in a ring".
>
> #CH: "aliphatic C with 0 further total connections, with 1 further
> hydrogen, not in a ring".
>
> #C-: "aliphatic C with 1 further total connections, with 0 further
> hydrogen, not in a ring".
>
> However, when I search subgroups using these SMARTS, I had problems.
>
> For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C=C=O')
> >>>
> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]"))
> ((1, 0, 2),)
>
> it prints out atomic positions 1, 0, 2--three positions. But I would expect
> only one position for the Carbon in the middle.
>
> Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('C#C')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]"))
> ((0, 1),)
> I would expect two separate positions such as (0,), (1,), indicating there
> are two carbon triple bonds (with an hydrogen).
>
>
> Then if  if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]",
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles('CC#CC')
> >>> m.GetSubstructMatches(Chem.MolFromSmarts(" [CH0;A;X2;!R]#[$(*)]"))
> ((1, 2),)
> Again, I would expect two separate positions such as (1,), (2,), indicating
> two carbon triple bonds.
>
> I think the problem might be my SMARTS for these three groups are not
> SPECIFIC. I would appreciate everyone's help on this.
>
> Cheers,
> Chenyang
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
> Message: 2
> Date: Wed, 8 Nov 2017 21:27:29 +0100
> From: Andrew Dalke <da...@dalkescientific.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> Subject: Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
> Message-ID: <8478f1ae-4916-4feb-8e67-e6cf4e52f...@dalkescientific.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Nov 8, 2017, at 21:00, Chenyang Shi <cs3...@columbia.edu> wrote:
> > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)]
>
> The recursive SMARTS notation, which is the term inside of the [$(...)],
> finds a match for the entire pattern and returns the first atom in th

Re: [Rdkit-discuss] tautomers in rdkit

2017-04-17 Thread JW Feng
Hi Maria,

>From looking at Roger's slides on https://github.com/rdkit/UGM_2
016/blob/master/Presentations/Sayle_RDKitTautomers.pdf.  Is he making an
argument that InChi values are insufficient in generating a canonical
string for different tautomers?  What if you perform a set of
standardization transformation prior to generating InChi values?  You may
want to look at how Genentech normalizes molecules for compound
registration. The code is based on OEChem and is open sourced on Github
https://github.com/chemalot/chemalot.  This package is actively being
developed and I am a contributor.  Specifically, you'll want to look at the
extensive standardization transformations in https://github.com/chemalot/ch
emalot/blob/master/src/com/genentech/struchk/oeStruchk/Struchk.xml

The last step in Struchk.xml is creating a canonical tautomer using
OpenEye's QuacPac toolkit.  QuacPac returns a canonical tautomer.  Could
one replace this step by converting a standardized molecule to InChi and
the back?  Another approach is using Dave Cosgrove's TautEnum package (
https://github.com/OpenEye-Contrib/TautEnum).  Both QuacPac and TautEnum
enumerates tautomers.  I believe that Roger is intimately familiar with
QuacPac

Best,

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Tue, Apr 11, 2017 at 6:52 AM, <rdkit-discuss-request@lists.s
ourceforge.net> wrote:

> Send Rdkit-discuss mailing list submissions to
> rdkit-discuss@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> or, via email, send a message with subject or body 'help' to
> rdkit-discuss-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> rdkit-discuss-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Rdkit-discuss digest..."
>
>
> Today's Topics:
>
>1. tautomers in rdkit (MARIA BRANDL)
>2. Re: tautomers in rdkit (Peter S. Shenkin)
>3. official Tripos MOL2 file format PDF document (Francois BERENGER)
>
>
> --
>
> Message: 1
> Date: Tue, 11 Apr 2017 06:43:39 + (UTC)
> From: MARIA BRANDL <m.bra...@btinternet.com>
> Subject: [Rdkit-discuss] tautomers in rdkit
> To: "rdkit-discuss@lists.sourceforge.net"
> <rdkit-discuss@lists.sourceforge.net>
> Message-ID: <1522420730.263132.1491893019...@mail.yahoo.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear all,
>
> Is there going to be an attempt at coding Roger Sayle's ?"Alternative
> Approach" to tautomers described inRDKit: Six Not-So-Easy Pieces [RDKit UGM
> 2016]?into RDKit ?
>
>
> I have managed to get reasonable tautomers out of Resonance.cpp using:
> suppl = 
> rdchem.ResonanceMolSupplier(m,rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION
> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
> rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS
> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
> rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS
> | \? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rdchem.ResonanceFlags.UNCONSTR
> AINED_ANIONS)
> ?with some post-filtering for e.g. carbocations, but feel that it may be
> more efficient to put user defined constraints on each atom during the
> backtracking loops, as Roger suggests.
> Looking forward to hearing your thoughts on this.
> Best regards,
> Maria Brandl
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
> Message: 2
> Date: Tue, 11 Apr 2017 03:47:47 -0400
> From: "Peter S. Shenkin" <shen...@gmail.com>
> Subject: Re: [Rdkit-discuss] tautomers in rdkit
> To: MARIA BRANDL <m.bra...@btinternet.com>
> Cc: "rdkit-discuss@lists.sourceforge.net"
> <rdkit-discuss@lists.sourceforge.net>
> Message-ID:
> <CAAsqebH6gVRpm2rhhzv0-koWVr6P0WU+QK0EO2=x4ctvhgx...@mail.gm
> ail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just from the slides, it's not clear that Roger had a solution; the slides
> seem to just suggest an approach. Am I missing something here?
>
> That is, he defined the invariants that all tautomers of a compound have to
> share and expressed it as a SMARTS + constraints; but I didn't see that he
> provided a methodology to derive a canonical matching SMILES from a SMARTS
> + constraints. True, if two structures match the SMARTS + constraints, they
> are likely tautomers. (I can't think of why they wouldn't be, but ma

Re: [Rdkit-discuss] Bug in AllChem.EmbedMultipleConfs pruning?

2016-12-22 Thread JW Feng
Hi Greg and Sereina,

Thanks for confirming the bug.  I also vote for changing the code to use
only heavy atoms.  Is symmetry taken into consideration when calculating
RMS during the pruning step?

Best,

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628

On Thu, Dec 22, 2016 at 7:02 AM, Sereina <sereina.rini...@gmail.com> wrote:

> Hi Greg,
>
> I would also vote for changing the code such that only heavy atoms are
> used in the RMS calculation.
>
> Best,
> Sereina
>
>
> On 22 Dec 2016, at 13:36, Greg Landrum <greg.land...@gmail.com> wrote:
>
> Hi JW,
>
> On Wed, Dec 21, 2016 at 11:57 PM, JW Feng <f...@dnli.com> wrote:
>
>>
>> I am using AllChem.EmbedMultipleConfs to generate conformers.  I noticed
>> that conformers in the result set are very similar to each other.  I wrote
>> a test script to calculate RMS for the conformers and may have found a
>> bug.  Looks like AllChem.EmbedMultipleConfs is calculating RMS using all
>> atoms, including Hs, when pruning.  The documents says pruning is based on
>> heavy atoms RMS.
>>
>
> You're absolutely correct. The code uses all atoms, but the documentation
> says it only uses heavy atoms.
> So there's either a bug in the documentation or in the code. Here's the
> github entry: https://github.com/rdkit/rdkit/issues/1227
>
> I believe the right thing to do is change the code, which will lead to
> different results from the embedding, but I will hold off on making the fix
> to see if any discussion materializes either here or on github.
>
>
>
>> Attached is my test script and an input file that illustrates the
>> problem.  In this script, 50 conformers are generated and pruneRmsThresh is
>> 0.5.  Pairwise RMS between conformers are >0.5 when H atoms are included.
>> Pairwise RMS are <0.5 for many conformers when only heavy atoms are
>> included.
>>
>
> Thanks for the detailed report and script to reproduce the problem!
>
> -greg
>
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/intel___
> 
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Bug in AllChem.EmbedMultipleConfs pruning?

2016-12-21 Thread JW Feng
Hi,

I am using AllChem.EmbedMultipleConfs to generate conformers.  I noticed
that conformers in the result set are very similar to each other.  I wrote
a test script to calculate RMS for the conformers and may have found a
bug.  Looks like AllChem.EmbedMultipleConfs is calculating RMS using all
atoms, including Hs, when pruning.  The documents says pruning is based on
heavy atoms RMS.

Attached is my test script and an input file that illustrates the problem.
In this script, 50 conformers are generated and pruneRmsThresh is 0.5.
Pairwise RMS between conformers are >0.5 when H atoms are included.
Pairwise RMS are <0.5 for many conformers when only heavy atoms are
included.

Thanks,

JW
_______
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628


21843_confs_output.sdf
Description: Binary data


21843_input.sdf
Description: Binary data
#!/usr/bin/env python
from __future__ import print_function
from __future__ import division
import sys
import os
import argparse
from rdkit import Chem
from rdkit.Chem import AllChem

# check to see if there are invalid properties
def main(argv=None):
parser = argparse.ArgumentParser()

# optional requirements, "required=True" makes it NOT optional
parser.add_argument("-in", dest="infile", required=True, help="input file")
parser.add_argument("-out", dest="outfile", required=True, help="output file")
parser.add_argument("-rmsd", dest="prune_rmsd", type=float, default=0.5,
help="RMSD criteria for generating unique confomers."
 " default=0.5")
parser.add_argument("-confs", dest="confs", type=int, default=50, help="number of confs to generate, default=50")
args = None
try:
args = parser.parse_args(argv)
except:
# useful parser functions
parser.print_help()
sys.stderr.write("Input parameters were incorrect, please check help messages\n")
return 2

suppl = Chem.ForwardSDMolSupplier(args.infile)
sd_writer = Chem.SDWriter(args.outfile)

for mol in suppl:
if mol is None:
print("skipping mol", file=sys.stderr)
continue

mol = Chem.AddHs(mol)
#EmbedMultipleConfs((Mol) mol[, (int) numConfs = 10[, (int) maxAttempts = 0[, (int) randomSeed = -1[, (bool)
#  clearConfs = True[, (bool) useRandomCoords = False[, (float) boxSizeMult = 2.0[, (bool) randNegEig = True[, (int)
#  numZeroFail = 1[, (float) pruneRmsThresh = -1.0[, (dict) coordMap = {}[, (float) forceTol = 0.001[, (bool)
#  ignoreSmoothingFailures = False[, (bool) enforceChirality = True[, (int) numThreads = 1[, (bool)
#  useExpTorsionAnglePrefs = False[, (bool) useBasicKnowledge = False[, (bool) printExpTorsionAngles = False
# ]) -> _vecti:
conf_ids = AllChem.EmbedMultipleConfs(mol, numConfs=args.confs, maxAttempts=500, pruneRmsThresh=args.prune_rmsd,
  randomSeed=1, numThreads=0, enforceChirality=True,
  useExpTorsionAnglePrefs=True, useBasicKnowledge=True)
# calculate RMSD between conformers, all should be greater than args.prune_rmsd

sys.stderr.write("RMSD calculated over all atoms, including H\n")
for id1 in conf_ids:
for id2 in conf_ids:
if id1 != id2:
#rms = AllChem.GetBestRMS(mol, mol, id1, id2, maps)
rms = AllChem.GetConformerRMS(mol, id1, id2)
if rms < args.prune_rmsd:
sys.stderr.write("RMSD between conf %d and %d: %.2f\n" % (id1, id2, rms))
break

sys.stderr.write("RMSD calculated over heavy atoms\n")
mol = AllChem.RemoveHs(mol)
for id1 in conf_ids:
for id2 in conf_ids:
if id1 != id2:
#rms = AllChem.GetBestRMS(mol, mol, id1, id2, maps)
rms = AllChem.GetConformerRMS(mol, id1, id2)
if rms < args.prune_rmsd:
sys.stderr.write("Heavy atom RMSD between conf %d and %d: %.2f\n" % (id1, id2, rms))
break

for id in conf_ids:
sd_writer.write(mol, confId=id)
sys.stderr.write("Generated %d conformers\n" % len(conf_ids))

if __name__ == "__main__":
# Let main()'s return value specify the exit status.
sys.exit(main())--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parall

[Rdkit-discuss] 2012 UGM slide link broken?

2016-09-20 Thread JW Feng
Hi,

I tried to download JP Ebejer's conformer generation using RDKit deck  but
this link appears to be broken.
http://rdkit.org/UGM/2012/Ebejer_20110926_RDKit_1stUGM.pdf

Would you mind moving these slides to github.org?

Thanks,

JW
___
JW Feng, Ph.D.
Denali Therapeutics Inc.
151 Oyster Point Blvd, 2nd Floor, South San Francisco, CA 94080 | (650)
270-0628
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SD tag reordering follow up

2015-10-22 Thread JW Feng
Hi Greg,

Thanks for the examples. I will give it a try.

JW

___
JW Feng, Ph.D.
Denali Therapeutics Inc.
201 Gateway Blvd. South San Francisco, CA 94080 | (650) 270-0628

On Thu, Oct 22, 2015 at 2:06 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi JW,
>
> On Thu, Oct 22, 2015 at 12:47 AM, JW Feng <f...@dnli.com> wrote:
>
>>
>> I read a post (link below) about SD tag reordering by Matthew and replied
>> by Greg and I have a follow up question. I would like to preserve the
>> ordering of SD tags as they appear in the input SD file. I tried getting
>> the list of SD tags by mol.GetPropNames() and setting the order with
>> sd_writer.SetProps() but that didn't work. Turns out mol.GetPropNames()
>> returns a list in alphabetical order instead of order of appearance.
>>
>
> I would say instead that they appear in an unspecified, implementation
> dependant, order. This may be alphabetic, but it's certainly not guaranteed
> to be so.
>
>
>> Is there a way to preserve SD tag orders?
>>
>
> There is currently no way to do this automatically. I have always thought
> about those properties as being unordered, so the RDKit doesn't maintain
> any record of what order properties are added to a molecule.
>
> As long as you have the original SDMolSupplier, you can pretty easily get
> the ordered list of property names from that:
>
> In [22]: suppl = Chem.SDMolSupplier('tmp.sdf')
>
> In [23]: m = suppl[0]
>
> In [25]: list(m.GetPropNames())   # <- here's the non-ordered list
> Out[25]:
> ['PUBCHEM_ATOM_DEF_STEREO_COUNT',
>  'PUBCHEM_ATOM_UDEF_STEREO_COUNT',
>  'PUBCHEM_BONDANNOTATIONS',
>  'PUBCHEM_BOND_DEF_STEREO_COUNT',
>  'PUBCHEM_BOND_UDEF_STEREO_COUNT',
>  'PUBCHEM_CACTVS_COMPLEXITY',
>  'PUBCHEM_CACTVS_HBOND_ACCEPTOR',
>  'PUBCHEM_CACTVS_HBOND_DONOR',
>  'PUBCHEM_CACTVS_ROTATABLE_BOND',
>  'PUBCHEM_CACTVS_SUBSKEYS',
>  'PUBCHEM_CACTVS_TAUTO_COUNT',
>  'PUBCHEM_CACTVS_TPSA',
>  'PUBCHEM_COMPONENT_COUNT',
>  'PUBCHEM_COMPOUND_CANONICALIZED',
>  'PUBCHEM_COMPOUND_CID',
>  'PUBCHEM_COORDINATE_TYPE',
>  'PUBCHEM_EXACT_MASS',
>  'PUBCHEM_HEAVY_ATOM_COUNT',
>  'PUBCHEM_ISOTOPIC_ATOM_COUNT',
>  'PUBCHEM_IUPAC_CAS_NAME',
>  'PUBCHEM_IUPAC_INCHI',
>  'PUBCHEM_IUPAC_INCHIKEY',
>  'PUBCHEM_IUPAC_NAME',
>  'PUBCHEM_IUPAC_OPENEYE_NAME',
>  'PUBCHEM_IUPAC_SYSTEMATIC_NAME',
>  'PUBCHEM_IUPAC_TRADITIONAL_NAME',
>  'PUBCHEM_MOLECULAR_FORMULA',
>  'PUBCHEM_MOLECULAR_WEIGHT',
>  'PUBCHEM_MONOISOTOPIC_WEIGHT',
>  'PUBCHEM_OPENEYE_CAN_SMILES',
>  'PUBCHEM_OPENEYE_ISO_SMILES',
>  'PUBCHEM_TOTAL_CHARGE',
>  'PUBCHEM_XLOGP3_AA']
>
> In [26]: txt = suppl.GetItemText(0)
>
> In [27]: pns = re.findall(r'> *<(\w+)>',txt)# <- this gives you the
> list in order
>
> In [28]: pns
> Out[28]:
> ['PUBCHEM_COMPOUND_CID',
>  'PUBCHEM_COMPOUND_CANONICALIZED',
>  'PUBCHEM_CACTVS_COMPLEXITY',
>  'PUBCHEM_CACTVS_HBOND_ACCEPTOR',
>  'PUBCHEM_CACTVS_HBOND_DONOR',
>  'PUBCHEM_CACTVS_ROTATABLE_BOND',
>  'PUBCHEM_CACTVS_SUBSKEYS',
>  'PUBCHEM_IUPAC_OPENEYE_NAME',
>  'PUBCHEM_IUPAC_CAS_NAME',
>  'PUBCHEM_IUPAC_NAME',
>  'PUBCHEM_IUPAC_SYSTEMATIC_NAME',
>  'PUBCHEM_IUPAC_TRADITIONAL_NAME',
>  'PUBCHEM_IUPAC_INCHI',
>  'PUBCHEM_IUPAC_INCHIKEY',
>  'PUBCHEM_XLOGP3_AA',
>  'PUBCHEM_EXACT_MASS',
>  'PUBCHEM_MOLECULAR_FORMULA',
>  'PUBCHEM_MOLECULAR_WEIGHT',
>  'PUBCHEM_OPENEYE_CAN_SMILES',
>  'PUBCHEM_OPENEYE_ISO_SMILES',
>  'PUBCHEM_CACTVS_TPSA',
>  'PUBCHEM_MONOISOTOPIC_WEIGHT',
>  'PUBCHEM_TOTAL_CHARGE',
>  'PUBCHEM_HEAVY_ATOM_COUNT',
>  'PUBCHEM_ATOM_DEF_STEREO_COUNT',
>  'PUBCHEM_ATOM_UDEF_STEREO_COUNT',
>  'PUBCHEM_BOND_DEF_STEREO_COUNT',
>  'PUBCHEM_BOND_UDEF_STEREO_COUNT',
>  'PUBCHEM_ISOTOPIC_ATOM_COUNT',
>  'PUBCHEM_COMPONENT_COUNT',
>  'PUBCHEM_CACTVS_TAUTO_COUNT',
>  'PUBCHEM_COORDINATE_TYPE',
>  'PUBCHEM_BONDANNOTATIONS']
>
> If you pass that list of property names to the SDWriter's SetPropNames()
> method, it will write things out in the input order.
>
> I hope this helps,
> -greg
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SD tag reordering follow up

2015-10-21 Thread JW Feng
Hi,

I read a post (link below) about SD tag reordering by Matthew and replied
by Greg and I have a follow up question. I would like to preserve the
ordering of SD tags as they appear in the input SD file. I tried getting
the list of SD tags by mol.GetPropNames() and setting the order with
sd_writer.SetProps() but that didn't work. Turns out mol.GetPropNames()
returns a list in alphabetical order instead of order of appearance. Is
there a way to preserve SD tag orders?

https://sourceforge.net/p/rdkit/mailman/message/32036716/

Thanks,

JW
___
JW Feng, Ph.D.
Denali Therapeutics Inc.
201 Gateway Blvd. South San Francisco, CA 94080 | (650) 270-0628
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss