Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 110, Issue 18

Esben Jannik Bjerrum Sun, 04 Dec 2016 12:05:31 -0800

Hi Carl,
Curt is right, theres no structural information in fasta files. I'm not sure 
what it is exactly you want to do or hope to achieve. RDkit can give you a 
molfile 
(http://rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#MolFromFASTA), 
but if you want to have a 3D protein structure from sequence, you'll need to do 
some homology modelling by using Salilabs modeller as an example 
(https://salilab.org/modeller/), or failing that theres a homologous protein 
structure available, some ab initio protein structure prediction software (I've 
seen Rosetta be successful once). Esben Jannik Bjerrum
cand.pharm, Ph.D
/Sent from my Ubuntu Touch Phone


Phone +45 2823 8009
http://dk.linkedin.com/in/esbenbjerrum
http://www.wildcardconsulting.dk
 

    On Sunday, December 4, 2016 7:26 PM, 
"[email protected]" 
<[email protected]> wrote:
 

 ----- Forwarded Message -----

Send Rdkit-discuss mailing list submissions to
    [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
or, via email, send a message with subject or body 'help' to
    [email protected]

You can reach the person managing the list at
    [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Rdkit-discuss digest..."

Today's Topics:

  1. File Conversion? (Carl MacGentey)
  2. Re: File Conversion? (Curt Fischer)
  3. Re: comparing two or more tables of molecules (Matthew Swain)
 <!--#yiv6217805229 _filtered #yiv6217805229 {font-family:"Cambria 
Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv6217805229 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv6217805229 
#yiv6217805229 p.yiv6217805229MsoNormal, #yiv6217805229 
li.yiv6217805229MsoNormal, #yiv6217805229 div.yiv6217805229MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", 
sans-serif;}#yiv6217805229 a:link, #yiv6217805229 
span.yiv6217805229MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv6217805229 a:visited, #yiv6217805229 
span.yiv6217805229MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv6217805229 
.yiv6217805229MsoChpDefault {} _filtered #yiv6217805229 {margin:1.0in 1.0in 
1.0in 1.0in;}#yiv6217805229 div.yiv6217805229WordSection1 {}-->Dear RDKit 
Discussion Group-    Is it possible to convert fasta files (DNA nucleotide 
sequences) into PDB files? I am wanting to view strands of DNA and full length 
genes in three dimensions.    Sent from Mail for Windows 10   This is not 
really possible.  Fasta files contain only sequence information, not 3D 
structural information.  
Curt
On Sun, Dec 4, 2016 at 7:00 AM, Carl MacGentey <[email protected]> wrote:

Dear RDKit Discussion Group- Is it possible to convert fasta files (DNA 
nucleotide sequences) into PDB files? I am wanting to view strands of DNA and 
full length genes in three dimensions. Sent from Mail for Windows 10 
------------------------------ ------------------------------ ------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
______________________________ _________________
Rdkit-discuss mailing list
Rdkit-discuss@lists. sourceforge.net
https://lists.sourceforge.net/ lists/listinfo/rdkit-discuss



Sorry Steve, there was a bug in MolVS that you encountered. Should now be fixed.
"pip install -U molvs" to get the update (v0.0.7).
Matt

On 1 Dec 2016, at 15:52, Stephen O'hagan <[email protected]> wrote:
Thanks for the interesting links.  MolVS looks good, but failed on 
‘NC(CC(=O)O)C(=O)[O-].O.O.[Na+]’ which isn’t that extraordinary…  Couldn’t get 
Standardise to work at all, even on the example given; API not intuitive or 
docs wrong or out of date.  I will have a look at the info in the UniChem 
paper, though not inclined to use a web service for what I want to do.  
Cheers,Steve.  From: George Papadatos [mailto:[email protected]] 
Sent: 01 December 2016 14:26
To: Greg Landrum <[email protected]>
Cc: Stephen O'hagan <[email protected]>; 
[email protected]; Francis Atkinson <[email protected]>
Subject: Re: [Rdkit-discuss] comparing two or more tables of molecules  HI 
Stephen,  Further to Greg's excellent reply, see this paper on how InChI 
strings and keys can be used in practice to map together tautomer (ones covered 
by InChI at least), isotope, stereo and parent-salt variants. 
http://rd.springer.com/article/10.1186/s13321-014-0043-5  Francis (cc'ed) has a 
nice notebook somewhere illustrating these nice InChI splits to find these 
variants.    For educational purposes, there have been other approaches like 
the NCI's identifiers - discussion here: 
http://acscinf.org/docs/meetings/237nm/presentations/237nm17.pdf  For pure 
structure standardization using RDKit see here: 
https://github.com/flatkinson/standardiserand https://github.com/mcs07/MolVS    
Cheers,   George        On 29 November 2016 at 17:02, Greg Landrum 
<[email protected]> wrote:
Wow, this is a great question and quite a fun thread.  It's hard to really make 
much of a contribution here without writing a book/review article (something 
that I'm really not willing to do!), but I have a few thoughts. Most of this is 
repeating/rephrasing things others have already said.  I'm going to propose 
some things as facts. I think that these won't be controversial:fact 1: if the 
structures are coming from different sources, they need to be 
standardized/normalized before you compare them. This is true regardless of how 
you want to compare them. The details of the standardization process are not 
incredibly important, but it does need to take care of the things you care 
about when comparing molecules. For example, if you don't care about 
differences between salts, it should strip salts. If you don't care about 
differences between tautomers, it should normalize tautomers.fact 2: The InChI 
algorithm includes a standardization step that normalizes some tautomers, but 
does not remove salts.fact 3: The InChI representation contain a number of 
layers defining the structure in increasing detail (this isn't strictly true, 
because some of the choices about how layers are ordered are arbitrary, but 
it's close).fact 4: canonicalization, the way I define it, produces a canonical 
atom numbering for a given structure, but it does *not* standardizefact 5: the 
RDKit has essentially no well-documented standardization code  fact X: we don't 
have any standard, broadly accepted approach for standardization, 
canonicalization or representation that is fool-proof or that works for even 
all of organic chemistry, never mind organometallics. InChI, useful as it is 
for some things, completely fails to handle things like atropisomers (they are 
working on this kind of thing, but it's not out yet).  Given all of this, if I 
wanted to have flexible duplicate checking *right* now, I think I would use the 
AvalonTools struchk functionality that the RDKit provides (the new pure-RDKit 
version still needs a bit more testing) to handle basic standardization and 
salt stripping and then produce a table that includes the InChI in a couple of 
different forms. I'd want to be able to recognize molecules that differ only by 
stereochemistry, molecules that differ only by location of tautomeric Hs, and 
molecules that differ only by the location of isotopic labels. You can do this 
with various clever splits of the InChI (how to do it is left as an exercise 
for the reader and/or a future RDKit blog post).   I think there's something 
fun to be done here with SMILES variants, borrowing heavily from some of the 
things that Roger has written 
about:https://nextmovesoftware.com/blog/2013/04/25/finding-all-types-of-every-mer/here's
 a more recent application of that from Noel: 
https://nextmovesoftware.com/blog/2016/06/22/fishing-for-matched-series-in-a-sea-of-structure-representations/
  If I didn't really care about details and just wanted something that I could 
explain easily to others, I'd skip all the complication and just use InChIs (or 
InChI keys) to recognize duplicates. There would be times when that would be 
the wrong answer, but it would be a broadly accepted kind of wrong.[1]  
Regardless of the approach, I would not, under most any circumstances, discard 
the original input structures that I had. It's really good to be able to figure 
out what the original data looked like later.  -greg[1] I'm crying as I write 
this...        On Mon, Nov 28, 2016 at 5:25 PM, Stephen O'hagan 
<[email protected]> wrote:
Has anyone come up with fool-proof way of matching structurally equivalent 
molecules? Unique Smiles or InChI String comparisons don’t appear to work 
presumable because there are different but equivalent structures, e.g. explicit 
vs non-explicit H’s, Kekule vs Aromatic, isomeric forms vs non-isomeric form, 
tautomers etc. I also expect that comparing InChI strings might need something 
more than just a simple string comparison, such as masking off stereo 
information when you don’t care about stereo isomers. I assume there are 
suitable tools within RDKit that can do this? N.B. I need to collate tables 
from several sources that have a mix of smiles / InChI / sdf molecular 
representations. I usually use RDKit via Python and/or Knime. Cheers,Steve.   
------------------------------------------------------------------------------

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  
------------------------------------------------------------------------------

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  ------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 110, Issue 18

Reply via email to