Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 110, Issue 18

2016-12-04 Thread Esben Jannik Bjerrum
Hi Carl,
Curt is right, theres no structural information in fasta files. I'm not sure 
what it is exactly you want to do or hope to achieve. RDkit can give you a 
molfile 
(http://rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#MolFromFASTA), 
but if you want to have a 3D protein structure from sequence, you'll need to do 
some homology modelling by using Salilabs modeller as an example 
(https://salilab.org/modeller/), or failing that theres a homologous protein 
structure available, some ab initio protein structure prediction software (I've 
seen Rosetta be successful once). Esben Jannik Bjerrum
cand.pharm, Ph.D
/Sent from my Ubuntu Touch Phone

Phone +45 2823 8009
http://dk.linkedin.com/in/esbenbjerrum
http://www.wildcardconsulting.dk
 

On Sunday, December 4, 2016 7:26 PM, 
"rdkit-discuss-requ...@lists.sourceforge.net" 
 wrote:
 

 - Forwarded Message -

Send Rdkit-discuss mailing list submissions to
    rdkit-discuss@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
or, via email, send a message with subject or body 'help' to
    rdkit-discuss-requ...@lists.sourceforge.net

You can reach the person managing the list at
    rdkit-discuss-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Rdkit-discuss digest..."

Today's Topics:

  1. File Conversion? (Carl MacGentey)
  2. Re: File Conversion? (Curt Fischer)
  3. Re: comparing two or more tables of molecules (Matthew Swain)
 Dear RDKit 
Discussion Group-    Is it possible to convert fasta files (DNA nucleotide 
sequences) into PDB files? I am wanting to view strands of DNA and full length 
genes in three dimensions.    Sent from Mail for Windows 10   This is not 
really possible.  Fasta files contain only sequence information, not 3D 
structural information.  
Curt
On Sun, Dec 4, 2016 at 7:00 AM, Carl MacGentey  wrote:

Dear RDKit Discussion Group- Is it possible to convert fasta files (DNA 
nucleotide sequences) into PDB files? I am wanting to view strands of DNA and 
full length genes in three dimensions. Sent from Mail for Windows 10 
-- -- --
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
__ _
Rdkit-discuss mailing list
Rdkit-discuss@lists. sourceforge.net
https://lists.sourceforge.net/ lists/listinfo/rdkit-discuss



Sorry Steve, there was a bug in MolVS that you encountered. Should now be fixed.
"pip install -U molvs" to get the update (v0.0.7).
Matt

On 1 Dec 2016, at 15:52, Stephen O'hagan  wrote:
Thanks for the interesting links.  MolVS looks good, but failed on 
‘NC(CC(=O)O)C(=O)[O-].O.O.[Na+]’ which isn’t that extraordinary…  Couldn’t get 
Standardise to work at all, even on the example given; API not intuitive or 
docs wrong or out of date.  I will have a look at the info in the UniChem 
paper, though not inclined to use a web service for what I want to do.  
Cheers,Steve.  From: George Papadatos [mailto:gpapada...@gmail.com] 
Sent: 01 December 2016 14:26
To: Greg Landrum 
Cc: Stephen O'hagan ; 
rdkit-discuss@lists.sourceforge.net; Francis Atkinson 
Subject: Re: [Rdkit-discuss] comparing two or more tables of molecules  HI 
Stephen,  Further to Greg's excellent reply, see this paper on how InChI 
strings and keys can be used in practice to map together tautomer (ones covered 
by InChI at least), isotope, stereo and parent-salt variants. 
http://rd.springer.com/article/10.1186/s13321-014-0043-5  Francis (cc'ed) has a 
nice notebook somewhere illustrating these nice InChI splits to find these 
variants.    For educational purposes, there have been other approaches like 
the NCI's identifiers - discussion here: 
http://acscinf.org/docs/meetings/237nm/presentations/237nm17.pdf  For pure 
structure standardization using RDKit see here: 
https://github.com/flatkinson/standardiserand https://github.com/mcs07/MolVS    
Cheers,   George        On 29 November 2016 at 17:02, Greg Landrum 
 wrote:
Wow, this is a great question and quite a fun thread.  It's hard to really make 
much of a contribution here without writing a book/review article (something 
that I'm really not willing to do!), but I have a few thoughts. Most of this is 
repeating/rephrasing things others have already said.  I'm going to propose 
some things as facts. I think that these won't be controversial:fact 1: if the 
structures are coming from different sources, they need to be 
standardized/normalized before you compare them. This is true regardless of how 
you want to compare them. The details of the standardization process are 

Re: [Rdkit-discuss] comparing two or more tables of molecules

2016-12-04 Thread Matthew Swain
Sorry Steve, there was a bug in MolVS that you encountered. Should now be fixed.

"pip install -U molvs" to get the update (v0.0.7).

Matt

> On 1 Dec 2016, at 15:52, Stephen O'hagan  wrote:
> 
> Thanks for the interesting links.
>  
> MolVS looks good, but failed on ‘NC(CC(=O)O)C(=O)[O-].O.O.[Na+]’ which isn’t 
> that extraordinary…
>  
> Couldn’t get Standardise to work at all, even on the example given; API not 
> intuitive or docs wrong or out of date.
>  
> I will have a look at the info in the UniChem paper, though not inclined to 
> use a web service for what I want to do.
>  
> Cheers,
> Steve.
>  
> From: George Papadatos [mailto:gpapada...@gmail.com] 
> Sent: 01 December 2016 14:26
> To: Greg Landrum 
> Cc: Stephen O'hagan ; 
> rdkit-discuss@lists.sourceforge.net; Francis Atkinson 
> Subject: Re: [Rdkit-discuss] comparing two or more tables of molecules
>  
> HI Stephen,
>  
> Further to Greg's excellent reply, see this paper on how InChI strings and 
> keys can be used in practice to map together tautomer (ones covered by InChI 
> at least), isotope, stereo and parent-salt variants. 
> http://rd.springer.com/article/10.1186/s13321-014-0043-5 
> 
>  
> Francis (cc'ed) has a nice notebook somewhere illustrating these nice InChI 
> splits to find these variants.  
>  
> For educational purposes, there have been other approaches like the NCI's 
> identifiers - discussion here: 
> http://acscinf.org/docs/meetings/237nm/presentations/237nm17.pdf 
> 
>  
> For pure structure standardization using RDKit see here: 
> https://github.com/flatkinson/standardiser 
> 
> and 
> https://github.com/mcs07/MolVS 
>  
>  
> Cheers, 
>  
> George
>  
>  
>  
>  
> On 29 November 2016 at 17:02, Greg Landrum  > wrote:
> Wow, this is a great question and quite a fun thread.
>  
> It's hard to really make much of a contribution here without writing a 
> book/review article (something that I'm really not willing to do!), but I 
> have a few thoughts. Most of this is repeating/rephrasing things others have 
> already said.
>  
> I'm going to propose some things as facts. I think that these won't be 
> controversial:
> fact 1: if the structures are coming from different sources, they need to be 
> standardized/normalized before you compare them. This is true regardless of 
> how you want to compare them. The details of the standardization process are 
> not incredibly important, but it does need to take care of the things you 
> care about when comparing molecules. For example, if you don't care about 
> differences between salts, it should strip salts. If you don't care about 
> differences between tautomers, it should normalize tautomers.
> fact 2: The InChI algorithm includes a standardization step that normalizes 
> some tautomers, but does not remove salts.
> fact 3: The InChI representation contain a number of layers defining the 
> structure in increasing detail (this isn't strictly true, because some of the 
> choices about how layers are ordered are arbitrary, but it's close).
> fact 4: canonicalization, the way I define it, produces a canonical atom 
> numbering for a given structure, but it does *not* standardize
> fact 5: the RDKit has essentially no well-documented standardization code
>  
> fact X: we don't have any standard, broadly accepted approach for 
> standardization, canonicalization or representation that is fool-proof or 
> that works for even all of organic chemistry, never mind organometallics. 
> InChI, useful as it is for some things, completely fails to handle things 
> like atropisomers (they are working on this kind of thing, but it's not out 
> yet).
>  
> Given all of this, if I wanted to have flexible duplicate checking *right* 
> now, I think I would use the AvalonTools struchk functionality that the RDKit 
> provides (the new pure-RDKit version still needs a bit more testing) to 
> handle basic standardization and salt stripping and then produce a table that 
> includes the InChI in a couple of different forms. I'd want to be able to 
> recognize molecules that differ only by stereochemistry, molecules that 
> differ only by location of tautomeric Hs, and molecules that differ only by 
> the location of isotopic labels. You can do this with various clever splits 
> of the InChI (how to do it is left as an exercise for the reader and/or a 
> future RDKit blog post). 
>  
> I think there's something fun to be done here with SMILES variants, borrowing 
> heavily from some of the things that Roger has written about:
> https://nextmovesoftware.com/blog/2013/04/25/finding-all-types-of-every-mer/ 
> 

Re: [Rdkit-discuss] File Conversion?

2016-12-04 Thread Curt Fischer
This is not really possible.  Fasta files contain only sequence
information, not 3D structural information.

Curt

On Sun, Dec 4, 2016 at 7:00 AM, Carl MacGentey 
wrote:

> Dear RDKit Discussion Group-
>
>
>
> Is it possible to convert fasta files (DNA nucleotide sequences) into PDB
> files? I am wanting to view strands of DNA and full length genes in three
> dimensions.
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] File Conversion?

2016-12-04 Thread Carl MacGentey
Dear RDKit Discussion Group-

Is it possible to convert fasta files (DNA nucleotide sequences) into PDB 
files? I am wanting to view strands of DNA and full length genes in three 
dimensions.

Sent from Mail for Windows 10

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Hankering after faster builds

2016-12-04 Thread Tim Dudgeon

Hi Greg,

On 03/12/2016 13:16, Greg Landrum wrote:
Builds do take a while, but there is *no way* they should be taking 2 
hours unless they are running on extremely overloaded hardware. The 
travis builds, which include running all the tests, typically take 
less than 40 minutes.
yes on my normal machine a build takes about 45 mins. But on Docker hub 
it seems your only get pretty limited hardware (1 core for instance) and 
can't do anything about this.


If, for some reason, you do still need to deal with this, I would 
guess that it would help to start by building an image without the 
java (or python) wrappers and then build an image on top of that which 
adds just the java (or python) wrappers
Yes, I can try that. In fact I'm sort of doing this for the Java 
wrappers, but not quite in that way.


Still, I would hope that you don't need to run too many builds on 
docker. It seems like one of those "once per release" activities. 
Certainly not something you'd want to do after every commit.
Yes, that's the case. Typically I build when there is a new release 
(e.g. every 6 months) and from time to time check that the build on 
master is still OK.


Tim


-greg





On Fri, Dec 2, 2016 at 6:29 PM, Tim Dudgeon > wrote:


Of course builds from source are never fast enough, and the RDKit
one is
pretty big.
So far I've lived with this and made cups of coffee.
But since I've been working with the Release_2016_09_2 release my
Docker
image builds on Docker Hub [1] are timing out as they sometimes exceed
the 2 hour limit. If I try at a quiet time I can sometimes get them to
complete, but I suppose the situation is only going to get worse.

I've tried breaking the build into 2 steps, the first to prepare the
base image [2] and the second to build RDKit [3], and that's helped a
bit, but I don't think there's more mileage here as nearly all the
time
is spend in the 'make' command.

Anticipating things getting worse, does anyone have suggestions for
speeding this up. I hanker after a --fast mode, but I suppose if there
was one it would already be the default.

Any ideas?

In case anyone wonders I'm building from source as I need to also
create
a Java enabled version and one for the Postgres cartridge and want to
use the same approach to generating them all.

Tim

[1] https://hub.docker.com/r/informaticsmatters/rdkit/

[2] https://github.com/InformaticsMatters/rdkit_debian_base

[3] https://github.com/InformaticsMatters/rdkit







--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss