Re: [Rdkit-discuss] The "maxAttempts" option in "EmbedMultipleConfs"

2019-12-17 Thread Michal Krompiec
It depends what you need it for, but if you want a more realistic
conformational analysis instead, CREST is the tool of choice.
https://xtb-docs.readthedocs.io/en/latest/crest.html
Best,
Michal


On Tue, Dec 17, 2019 at 16:26 topgunhaides .  wrote:

> Hi guys,
>
> Can anyone tell me more about the "maxAttempts" option in
> "EmbedMultipleConfs"?
>
> In the documentation, it says " maxAttempts: the maximum number of
> attempts to try embedding".
> Dose it mean the "maximum number of attempts" to generate each conformer
> or to generate the total number of conformers specified by "numConfs"? Or
> something else?
> I need to generate a huge amount of conformers for each molecule, so I
> want to know what is the proper "maxAttempts" to reach a balance between
> accuracy and cost.
> Thank you!
>
> Best,
> Leon
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The "maxAttempts" option in "EmbedMultipleConfs"

2019-12-17 Thread Tim Dudgeon
AFAIK the only correlation is the molecule(s) you are "conformering" (is 
that a verb?).


And no number of numConfs can be considered enough, let alone a number 
of maxAttempts.
It  depends on the problem you are wanting to solve, the molecules you 
are looking at, and the amount of CPU you have to crack that particular nut.


I tend to use 1.5x - 2.0x but for a set of (fairly small) molecules.
If your molecules are more flexible then you maybe you need to go 10x.

Tim

On 17/12/2019 19:57, topgunhaides . wrote:

Hi Tim,

Many thanks for your help! One further question:
Is there any correlation between the maxAttempts and numConfs?
For instance, do I need to set a higher maxAttempts value if I request 
a higher numConfs value?


Or to put it another way, what maxAttempts value can be considered as 
"enough"?

Best,
Leon





On Tue, Dec 17, 2019 at 12:22 PM Tim Dudgeon > wrote:


This is in regard to the pruneRmsThresh option which removes very
similar conformers.
If let's say numConfs is set to 10 and maxAttempts is set to 20
then it will use UP TO 20 attempts to generate 10 conformers.
If too many conformers get rejected due to pruneRmsThresh then you
will end up with less than 10 conformers.

Or to put it another way, maxAttempts avoids you trying for ever
to generate conformers that are all the same and getting rejected!

Tim

On 17/12/2019 16:24, topgunhaides . wrote:

Hi guys,

Can anyone tell me more about the "maxAttempts" option in
"EmbedMultipleConfs"?

In the documentation, it says " maxAttempts: the maximum number
of attempts to try embedding".
Dose it mean the "maximum number of attempts" to generate each
conformer or to generate the total number of conformers
specified by "numConfs"? Or something else?
I need to generate a huge amount of conformers for each molecule,
so I want to know what is the proper "maxAttempts" to reach a
balance between accuracy and cost.
Thank you!

Best,
Leon




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net  

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reduced graphs fingerprints in postgresql cartridge

2019-12-17 Thread Peter Schmidtke
Thanks Maciek,

will check that out just wasn’t sure if this applies correctly to the reduced 
graph fingerprints which seemed a bit different:
ie: https://iwatobipen.wordpress.com/2016/01/16/ergfingerprint-in-rdkit

Anyway, we’ll try that out and write something up in case it’s working ;)

Cheers

Peter

On 17 Dec 2019, at 20:18, Maciek Wójcikowski 
mailto:mac...@wojcikowski.pl>> wrote:

While creating more detailed answer for you I stumbled upon very useful 
blogpost by Greg 
https://rdkit.blogspot.com/2017/04/using-custom-fingerprint-in-postgresql.html 
which explains in detail how custom fingerprints can be handled.

Both Tanimoto and Dice are supported for any sfp/bfp.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


wt., 17 gru 2019 o 18:38 Maciek Wójcikowski 
mailto:mac...@wojcikowski.pl>> napisał(a):
Hi Peter,

You can index any binary fingerprint (both sparse and explicit). Also, you can 
create any custom fp in python and pass it over to postgresql. That said, I 
have not managed to transfer a sparse one from python to postgres, only the 
explicit.

Best,
Maciek

wt., 17 gru 2019, 13:00 użytkownik Peter Schmidtke 
mailto:peter.schmid...@discngine.com>> napisał:
Hi all,

is it possible to index the reduced graphs fingerprints in the pgsql cartridge 
as well? From my understanding the fingerprint provided by rdkit isn’t exactly 
in the same format as for standard morgan fingerprints.
Would this work anyhow? if yes with which similarity functions in pgsql? 
Anybody ever tried this and has a bit of documentation?

Thanks in advance

Peter
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-17 Thread Lukas Pravda
Hi IIllimar,

I don’t really know what your use case is, so it may be completely useless. 
However, just to add my two cents, we've created a package that builds on the 
top of rdkit and parses PDB ligand definitions from cif files. You can find the 
package here: https://gitlab.ebi.ac.uk/pdbe/ccdutils and the documentation can 
be found here: https://pdbe.gitdocs.ebi.ac.uk/ccdutils/ 

Let me know if this is helpful or you need further help.

Best,
Lukas

On 16/12/2019, 20:03, "Paolo Tosco"  wrote:

Hi IIllimar,

The RDKit PDB reader only recognize standard amino acids and, after the 
PR I did on Saturday https://github.com/rdkit/rdkit/pull/2850 will be 
merged, nucleic acid bases.

Anything else will not have the correct hybridization/bond orders 
perceived, as those are not encoded in the PDB format and the PDB reader 
does not have any functionality to do that.

The 1ARJ case is peculiar, as it has an ARG residue which would be 
recognized if it were in the ATOM records, but not in the HETATM 
section, for which no attempt to perceive the correct hybridization/bond 
is made.

My suggestion, if you are using standard PDB files, is to download the 
SDF file:


https://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=A2F=3GOT=all=false=false

and construct your RDKit molecule from that.

You should be able to automate that without too much effort either 
constructing URLs using the template above or using the PDB REST API.

Cheers,
p.

On 16/12/2019 18:24, Illimar Hugo Rekand wrote:
> Thanks, Paolo, for a good and clear example.
>
>
> I adapted your code into my workflow to calculate some 
Lipinski-properties of RNA pdb-structures, and ran into some issues. I'm not 
sure if I should make a new thread or throw this onto this one I already 
created?
>
>
> I used the following code under
>
>
> from rdkit import Chem
> from rdkit.Chem import rdmolops, Lipinski
> from urllib.request import urlopen
> import gzip
> import pprint
> pp = pprint.PrettyPrinter(indent=4)
>
>
> Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3,
>  'HeavyAtomCount':Lipinski.HeavyAtomCount,
>  'NHOHCount': Lipinski.NHOHCount,
>  "NOCount":Lipinski.NOCount,
>  "NumAliphaticCarbocycles": 
Lipinski.NumAliphaticCarbocycles,
>  "NumAliphaticHeterocycles" : 
Lipinski.NumAliphaticHeterocycles,
>  'NumAliphaticRings' :  Lipinski.NumAliphaticRings,
>  'NumAromaticCarbocycles' : 
Lipinski.NumAromaticCarbocycles,
>  'NumAromaticHeterocycles' : 
Lipinski.NumAromaticHeterocycles,
>  'NumAromaticRings' : Lipinski.NumAromaticRings,
>  'NumHAcceptors' : Lipinski.NumHAcceptors,
>  'NumHDonors' : Lipinski.NumHDonors,
>  'NumHeteroatoms' : Lipinski.NumHeteroatoms,
>  'NumRotatableBonds' : Lipinski.NumRotatableBonds,
>  'NumSaturatedCarbocycles' : 
Lipinski.NumSaturatedCarbocycles,
>  'NumSaturatedHeterocycles' : 
Lipinski.NumSaturatedHeterocycles,
>  'NumSaturatedRings' : Lipinski.NumSaturatedRings,
>  'RingCount' : Lipinski.RingCount
>  }
>
> url =  "https://files.rcsb.org/download/1arj.pdb.gz;
> pdb_data = gzip.decompress(urlopen(url).read())
> mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data))
> bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in 
mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ 
b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()}
> [mol.RemoveBond(*b) for b in bonds_to_cleave]
> hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, 
sanitizeFrags=True) if f.GetNumAtoms() and 
f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()]
> for hetatm in hetatm_frags:
>  res_name = 
hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName()
>  calculated_props = {}
>  for prop in Lipinski_dic:
>  function = Lipinski_dic[prop]
>  x = function(hetatm)
>  calculated_props[prop] = x
>  pp.pprint(calculated_props)
>
>
> and as you can see the properties of the ligand doesn't match up with 
what is expected (The number of SP3-atoms doesn't match up). When parsing 
through the structure 3got, it fails to recognize the aromatic rings of the 
ligand A2F. I'm assuming this is caused by RDKit not assigning bond orders 
correctly when reading in RNA and DNA pdb files (something which I have 
reported in earlier on this mailing list)?
>
>
> Running hetatm.UpdatePropertyCache(strict=True) does not remedy this 
problem. 

Re: [Rdkit-discuss] The "maxAttempts" option in "EmbedMultipleConfs"

2019-12-17 Thread topgunhaides .
Hi Tim,

Many thanks for your help! One further question:
Is there any correlation between the maxAttempts and numConfs?
For instance, do I need to set a higher maxAttempts value if I request a
higher numConfs value?

Or to put it another way, what maxAttempts value can be considered as
"enough"?

Best,
Leon





On Tue, Dec 17, 2019 at 12:22 PM Tim Dudgeon  wrote:

> This is in regard to the pruneRmsThresh option which removes very similar
> conformers.
> If let's say numConfs is set to 10 and maxAttempts is set to 20 then it
> will use UP TO 20 attempts to generate 10 conformers.
> If too many conformers get rejected due to pruneRmsThresh then you will
> end up with less than 10 conformers.
>
> Or to put it another way, maxAttempts avoids you trying for ever to
> generate conformers that are all the same and getting rejected!
>
> Tim
> On 17/12/2019 16:24, topgunhaides . wrote:
>
> Hi guys,
>
> Can anyone tell me more about the "maxAttempts" option in
> "EmbedMultipleConfs"?
>
> In the documentation, it says " maxAttempts: the maximum number of
> attempts to try embedding".
> Dose it mean the "maximum number of attempts" to generate each conformer
> or to generate the total number of conformers specified by "numConfs"? Or
> something else?
> I need to generate a huge amount of conformers for each molecule, so I
> want to know what is the proper "maxAttempts" to reach a balance between
> accuracy and cost.
> Thank you!
>
> Best,
> Leon
>
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reduced graphs fingerprints in postgresql cartridge

2019-12-17 Thread Maciek Wójcikowski
While creating more detailed answer for you I stumbled upon very useful
blogpost by Greg
https://rdkit.blogspot.com/2017/04/using-custom-fingerprint-in-postgresql.html
which
explains in detail how custom fingerprints can be handled.

Both Tanimoto and Dice are supported for any sfp/bfp.

Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl


wt., 17 gru 2019 o 18:38 Maciek Wójcikowski 
napisał(a):

> Hi Peter,
>
> You can index any binary fingerprint (both sparse and explicit). Also, you
> can create any custom fp in python and pass it over to postgresql. That
> said, I have not managed to transfer a sparse one from python to postgres,
> only the explicit.
>
> Best,
> Maciek
>
> wt., 17 gru 2019, 13:00 użytkownik Peter Schmidtke <
> peter.schmid...@discngine.com> napisał:
>
>> Hi all,
>>
>> is it possible to index the reduced graphs fingerprints in the pgsql
>> cartridge as well? From my understanding the fingerprint provided by rdkit
>> isn’t exactly in the same format as for standard morgan fingerprints.
>> Would this work anyhow? if yes with which similarity functions in pgsql?
>> Anybody ever tried this and has a bit of documentation?
>>
>> Thanks in advance
>>
>> Peter
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reduced graphs fingerprints in postgresql cartridge

2019-12-17 Thread Maciek Wójcikowski
Hi Peter,

You can index any binary fingerprint (both sparse and explicit). Also, you
can create any custom fp in python and pass it over to postgresql. That
said, I have not managed to transfer a sparse one from python to postgres,
only the explicit.

Best,
Maciek

wt., 17 gru 2019, 13:00 użytkownik Peter Schmidtke <
peter.schmid...@discngine.com> napisał:

> Hi all,
>
> is it possible to index the reduced graphs fingerprints in the pgsql
> cartridge as well? From my understanding the fingerprint provided by rdkit
> isn’t exactly in the same format as for standard morgan fingerprints.
> Would this work anyhow? if yes with which similarity functions in pgsql?
> Anybody ever tried this and has a bit of documentation?
>
> Thanks in advance
>
> Peter
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The "maxAttempts" option in "EmbedMultipleConfs"

2019-12-17 Thread Tim Dudgeon
This is in regard to the pruneRmsThresh option which removes very 
similar conformers.
If let's say numConfs is set to 10 and maxAttempts is set to 20 then it 
will use UP TO 20 attempts to generate 10 conformers.
If too many conformers get rejected due to pruneRmsThresh then you will 
end up with less than 10 conformers.


Or to put it another way, maxAttempts avoids you trying for ever to 
generate conformers that are all the same and getting rejected!


Tim

On 17/12/2019 16:24, topgunhaides . wrote:

Hi guys,

Can anyone tell me more about the "maxAttempts" option in 
"EmbedMultipleConfs"?


In the documentation, it says " maxAttempts: the maximum number of 
attempts to try embedding".
Dose it mean the "maximum number of attempts" to generate each 
conformer or to generate the total number of conformers specified by 
"numConfs"? Or something else?
I need to generate a huge amount of conformers for each molecule, so I 
want to know what is the proper "maxAttempts" to reach a balance 
between accuracy and cost.

Thank you!

Best,
Leon




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] The "maxAttempts" option in "EmbedMultipleConfs"

2019-12-17 Thread topgunhaides .
Hi guys,

Can anyone tell me more about the "maxAttempts" option in
"EmbedMultipleConfs"?

In the documentation, it says " maxAttempts: the maximum number of attempts
to try embedding".
Dose it mean the "maximum number of attempts" to generate each conformer or
to generate the total number of conformers specified by "numConfs"? Or
something else?
I need to generate a huge amount of conformers for each molecule, so I want
to know what is the proper "maxAttempts" to reach a balance between
accuracy and cost.
Thank you!

Best,
Leon
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] reduced graphs fingerprints in postgresql cartridge

2019-12-17 Thread Peter Schmidtke
Hi all,

is it possible to index the reduced graphs fingerprints in the pgsql cartridge 
as well? From my understanding the fingerprint provided by rdkit isn’t exactly 
in the same format as for standard morgan fingerprints. 
Would this work anyhow? if yes with which similarity functions in pgsql? 
Anybody ever tried this and has a bit of documentation?

Thanks in advance

Peter
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss