Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread ro...@nextmovesoftware.com

Hi James and Greg,

On Oct 25, 2013, at 4:03 AM, Greg Landrum wrote:
 1.   Do I remember correctly that there was a proposal (from  
 Roger) to add some auto bond-type perception to the PDB parser for  
 ligands (or is that just wishful thinking!)?

 Roger will have to confirm this, but I believe he said something  
 along the lines of that way lies madness.

My first comment is that a computational chemistry toolkit's assign  
bonds orders,
formal charges and protonation states from 3D coordinates function is/ 
should be
a (sanitize-like) step independent of its PDB file reader.  For one  
thing, this
functionality is required for reading XYZ format files, Schrodinger  
maestro files,
and quantum mechanics files formats, such as Gaussian and MOPAC.  For
another thing, many PDB file reading applications don't require bond  
orders,
e.g. GRASP surfaces and many docking functions/forcefield  
calculations, so
handling bond order perception independently of PDB reading has some  
merit.

All I'll say at this stage is that correctly perceiving bonds, formal  
charges and
protonation state (they're all interdependent) is probably more  
complicated than
most folks think.  Indeed, many of the crystallographers at the RDKit  
meeting
claimed it was impossible.  The bondage algorithm used in OpenEye's  
OEChem
is several thousands of lines of C++, and was still improving (on  
things like
iron-sulfur clusters and oxime vs. nitroso perception) up to the point  
I left
Santa Fe in 2010.  The state-of-the-art from a decade ago is described  
at:
http://www.daylight.com/meetings/mug01/Sayle/m4xbondage.html and was
used at the time to produce a searchable database of PDB ligands:
http://www.metaphorics.com/products/luna.html

 3.   Is there some explanation for what the ‘flavor’ option does for  
 reading/writing PDB?

 I'm not sure about the reader. Roger, can you answer that?

 This is what's in the C++ for the PDBWriter:
 // PDBWriter support multiple flavors of PDB output
 // flavor  1 : Write MODEL/ENDMDL lines around each record
 // flavor  2 : Don't write any CONECT records
 // flavor  4 : Write CONECT records in both directions
 // flavor  8 : Don't use multiple CONECTs to encode bond order
 // flavor  16 : Write MASTER record
 // flavor  32 : Write TER record

 This is now in the docs for both the Python and C++ code.

The use of an integer file format flavor argument allows the caller  
to customize
the behavior of the readers and writers.  The semantics is that a  
reasonable default
is zero (for all bits), but that new features may be added without  
changing the API/ABI.
Most of the bits above (for the writer) control strict compliance with  
the PDB format
specification.  For example, a flavor of 12 will write bond orders the  
way the RCSB
expects them both throwing away bond orders and increasing the size of  
the PDB file.

For the reader, the flavor argument controls whether alternate  
locations are read
(for use by PDB power users), or whether a sensible subset of atoms is  
used for
the RDKit::ROMol.
 5.   It seems to me that GetResidueNumber() and  
 GetSerialNumber() may have got mixed-up at some point(?).  At least,  
 when I call GetSerialNumber() I see what appears to be the residue  
 number; and when I call GetResidueNumber() I get “0”!

 This was another dumb bug from me. It's fixed.

Greg is being modest.  At the time of the RDKit meeting, the  
MonomerInfo data structure
had just a SerialNumber field which was used for storing residue  
numbers.  One of my
suggestions back to Greg was that although everything worked, this  
nomenclature might
be confusing to folks using the API, so it was suggested to rename the  
field for the Q3 beta.
The better solution was to support fields for both ResidueNumber and  
SerialNumber, but
following that change I failed to send the patch to make the reader/ 
writer use the correct
(changed) residueNumber field, and record/honour the serial number  
field.

My apologies.  I share some of the blame for this one.

 6.   I also seem to be seeing all of the bonds (for all  
 residues) being written out in CONECT records – such that they all  
 appear as single bonds in eg PyMOL – is this expected behaviour at  
 the moment?

 Another one for Roger.

I believe this should work fine.  RDKit's PDB file writer by default  
encodes the bond
orders, which should be interpreted by PyMol.  In the words of the  
late great Warren:
http://www.phenix-online.org/pipermail/phenixbb/2008-April/012188.html

We need to check where the bond orders are getting lost.  If you read  
the PDB file
back RDKit's PDB file reader and write out the SMILES does it have  
double bonds?


I hope this helps.

Many thanks again to Greg for all the code polishing described above.

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science  
Park, Cambridge CB4 0EY



Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread sereina riniker
Hi James,

Regarding the AssignBondOrdersFromTemplate() method:
As far as I understood, the PDB reader assigns bond orders to the amino
acids in a protein, but if a ligand is present it puts all bonds of it to
SINGLE bonds as auto bond-type perception is not trivial (see Roger's
comments). However, usually one knows which ligand was crystallized (i.e.
the SMILES is available), so the AssignBondOrdersFromTemplate() method can
be used to set the bond orders based on the known ligand structure. This is
the idea of the method. Now, to your real-world application. I'm sorry but
I don't think I understand it completely. Do you want to set only the bond
orders of a specific substructure? Or would you like to give the function a
set of ligands and a set of templates and it figures out which template
belongs to which ligand and sets the bonds orders accordingly?

Best,
Sereina



2013/10/24 Greg Landrum greg.land...@gmail.com

 James,

 On Thu, Oct 24, 2013 at 7:27 PM, James Davidson 
 j.david...@vernalis.comwrote:

  Hi Greg (et al.),

 ** **

 Thanks for the beta!  I have been going through some of the
 recently-added functionality, and had a couple of questions regarding the
 PDB reading / writing.


 Thanks for the bug reports!

 **

 **1.   **Do I remember correctly that there was a proposal (from
 Roger) to add some auto bond-type perception to the PDB parser for ligands
 (or is that just wishful thinking!)?

 Roger will have to confirm this, but I believe he said something along the
 lines of that way lies madness.

 2.   **If not, I notice that there is an
 AssignBondOrdersFromTemplate() method – but the example in the doc-string
 only shows (I think) the case where the input PDB is just a single small
 molecule – so the matching is pretty easy!  I think a more real-World case
 is when one wants to set the bond orders for multiple ligands (HETATM
 residues) based on substructure matches – which will then return an atom
 index selection that can be used as a start point.  Is there any way to
 have the AssignBondOrdersFromTemplate() convenience function optionally
 accept a list of atom indexes to specify a substructure?

 Sereina? Is that doable?

 

 **3.   **Is there some explanation for what the ‘flavor’ option does
 for reading/writing PDB?

 I'm not sure about the reader. Roger, can you answer that?

 This is what's in the C++ for the PDBWriter:
 // PDBWriter support multiple flavors of PDB output
 // flavor  1 : Write MODEL/ENDMDL lines around each record
 // flavor  2 : Don't write any CONECT records
 // flavor  4 : Write CONECT records in both directions
 // flavor  8 : Don't use multiple CONECTs to encode bond order
 // flavor  16 : Write MASTER record
 // flavor  32 : Write TER record

 This is now in the docs for both the Python and C++ code.

 

 **4.   **Having read in a PDB file I see the correct atoms flagged
 as HETATM (from GetIsHeteroAtom()).  But when call Chem.MolToPDBBlock()
 these atoms get written as ATOM records…  Also, a Chem.MolToPDBFile()
 method would be nice for completeness / symmetry : )

 The HETATM thing was the result of a dumb copy and paste error from me.
 It's fixed.

 Re: Chem.MolToPDBFile()
 that's missing because there's no corresponding Chem.MolToMolFile()
 This is an odd oversight, which I've now fixed.

 

 **5.   **It seems to me that GetResidueNumber() and
 GetSerialNumber() may have got mixed-up at some point(?).  At least, when I
 call GetSerialNumber() I see what appears to be the residue number; and
 when I call GetResidueNumber() I get “0”!

 This was another dumb bug from me. It's fixed.

 

 **6.   **I also seem to be seeing all of the bonds (for all
 residues) being written out in CONECT records – such that they all appear
 as single bonds in eg PyMOL – is this expected behaviour at the moment?

 Another one for Roger.

 -greg



 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread James Davidson
Hi Sereina,

Sereina wrote:
 Regarding the AssignBondOrdersFromTemplate() method:
 As far as I understood, the PDB reader assigns bond orders to the amino acids 
 in a protein, but if a ligand is present it puts all bonds of it to SINGLE 
 bonds as auto bond-type perception is not trivial (see Roger's comments).
 However, usually one knows which ligand was crystallized (i.e. the SMILES is 
 available), so the AssignBondOrdersFromTemplate() method can be used to set 
 the bond orders based on the known ligand structure.
 This is the idea of the method. Now, to your real-world application. I'm 
 sorry but I don't think I understand it completely. Do you want to set only 
 the bond orders of a specific substructure?
 Or would you like to give the function a set of ligands and a set of 
 templates and it figures out which template belongs to which ligand and sets 
 the bonds orders accordingly? 

This is very likely to be me being stupid - so please bear with me!
If I read in a complex (pdb), and already have my reference ligand (lig), then 
AllChem.AssignBondOrdersFromTemplate(lig, pdb) fails because the reference 
ligand has not been matched to the ligand in the pdb 'complex' (dot-separated 
list of molecules).
The doc-string states that the method works on two molecules - but I want to 
work on a reference molecule (lig) and a *substructure* of the macromolecule 
(pdb).  How should I be getting the bound ligand out as a molecule object to 
then use the AssignBondOrdersFromTemplate() method?  Am I missing some new 
PDB-related methods, or have I forgotten some fundamental RDKit methods for 
dealing with multi-component molecules?

I guess a sensible process would be:
1. Identify any HETATM residues
2. For each residue (or at least those that have bonds!) extract or copy the 
mol (unless it can be addressed 'in place'?)
3. Use AssignBondOrdersFromTemplate() - relying on lookup be eg residue name, 
etc
4. Insert the molecule back into the complex (or update the info if it has been 
modified 'in place')

Is this how the method is intended to be used with complexes (and if so, do you 
have an example for steps 2 and 4?

Thanks

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread Paul Emsley
On 25/10/13 08:09, James Davidson wrote:
 Hi Roger,

 Thanks for the response

 The use of an integer file format flavor argument allows the caller to
 customize the behavior of the readers and writers.  The semantics is that a
 reasonable default is zero (for all bits), but that new features may be added
 without changing the API/ABI.
 Most of the bits above (for the writer) control strict compliance with the 
 PDB
 format specification.  For example, a flavor of 12 will write bond orders the
 way the RCSB expects them both throwing away bond orders and increasing
 the size of the PDB file.
 As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB 
 using the following

 import requests
 url = 
 http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdbcompression=NOstructureId=2VCI;
 response = requests.get(url)
 pdb_block = response.content
 response.close()


 pdb_block shows CONECT records only for the HETATM records.
 If I now read into RDKit, using the defaults, and write back out using the 
 defaults, I see CONECT records for every atom (ie protein as well).  And I 
 can't see any double-bonds rendered in PyMOL:

 from rdkit import Chem
 from rdkit.Chem import AllChem
 pdb = Chem.MolFromPDBBlock(pdb_block)
 pdb_block_out = Chem.MolToPDBBlock(pdb)

 First 10 CONECT records of output:
 CONECT12
 CONECT235
 CONECT344   10
 CONECT56
 CONECT67
 CONECT7889
 CONECT   10   11
 CONECT   11   12   14
 CONECT   12   13   13   17
 CONECT   14   15   16


 If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand 
 CONECT records in what looks like the original format (albeit now numbered 
 differently), and I still see CONECT records for the protein - but this PDB 
 *will* render double bonds in PyMOL.

 First 10 CONECT records of output:
 CONECT344
 CONECT788
 CONECT   12   13   13
 CONECT   19   20   20
 CONECT   23   24   24
 CONECT   28   29   29
 CONECT   35   36   36
 CONECT   38   39   39
 CONECT   40   42   42
 CONECT   41   43   43


If I may be so bold, I believe an important part of the puzzle is 
missing.  The residue-name/3-letter-code/comp-id in the PDB file is a 
pointer to an entry in the mmCIF-formatted chemical component dictionary 
that describes the compound, for all compounds for all entries released 
by the PDB.

http://deposit.pdb.org/cc_dict_tut.html

If this is an internal PDB file there will, very likely be a similar 
mmCIF file used for crystallographic refinement.

Only when these options fail would I consider turning to bond-order 
perception and CONECT records.

Paul.


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread sereina riniker
Hi James,

Okay, now it's clear. I somehow (wrongly) thought the PDB reader would give
you the protein and the ligand as two molecules and then it wouldn't have
been a problem... I will discuss with Greg on how to best do this and get
back to you.

Best,
Sereina


2013/10/25 James Davidson j.david...@vernalis.com

 Hi Sereina,

 Sereina wrote:
  Regarding the AssignBondOrdersFromTemplate() method:
  As far as I understood, the PDB reader assigns bond orders to the amino
 acids in a protein, but if a ligand is present it puts all bonds of it to
 SINGLE bonds as auto bond-type perception is not trivial (see Roger's
 comments).
  However, usually one knows which ligand was crystallized (i.e. the
 SMILES is available), so the AssignBondOrdersFromTemplate() method can be
 used to set the bond orders based on the known ligand structure.
  This is the idea of the method. Now, to your real-world application. I'm
 sorry but I don't think I understand it completely. Do you want to set only
 the bond orders of a specific substructure?
  Or would you like to give the function a set of ligands and a set of
 templates and it figures out which template belongs to which ligand and
 sets the bonds orders accordingly?

 This is very likely to be me being stupid - so please bear with me!
 If I read in a complex (pdb), and already have my reference ligand (lig),
 then AllChem.AssignBondOrdersFromTemplate(lig, pdb) fails because the
 reference ligand has not been matched to the ligand in the pdb 'complex'
 (dot-separated list of molecules).
 The doc-string states that the method works on two molecules - but I want
 to work on a reference molecule (lig) and a *substructure* of the
 macromolecule (pdb).  How should I be getting the bound ligand out as a
 molecule object to then use the AssignBondOrdersFromTemplate() method?  Am
 I missing some new PDB-related methods, or have I forgotten some
 fundamental RDKit methods for dealing with multi-component molecules?

 I guess a sensible process would be:
 1. Identify any HETATM residues
 2. For each residue (or at least those that have bonds!) extract or copy
 the mol (unless it can be addressed 'in place'?)
 3. Use AssignBondOrdersFromTemplate() - relying on lookup be eg residue
 name, etc
 4. Insert the molecule back into the complex (or update the info if it has
 been modified 'in place')

 Is this how the method is intended to be used with complexes (and if so,
 do you have an example for steps 2 and 4?

 Thanks

 James

 __
 PLEASE READ: This email is confidential and may be privileged. It is
 intended for the named addressee(s) only and access to it by anyone else is
 unauthorised. If you are not an addressee, any disclosure or copying of the
 contents of this email or any action taken (or not taken) in reliance on it
 is unauthorised and may be unlawful. If you have received this email in
 error, please notify the sender or postmas...@vernalis.com. Email is not
 a secure method of communication and the Company cannot accept
 responsibility for the accuracy or completeness of this message or any
 attachment(s). Please check this email for virus infection for which the
 Company accepts no responsibility. If verification of this email is sought
 then please request a hard copy. Unless otherwise stated, any views or
 opinions presented are solely those of the author and do not represent
 those of the Company.

 The Vernalis Group of Companies
 100 Berkshire Place
 Wharfedale Road
 Winnersh, Berkshire
 RG41 5RD, England
 Tel: +44 (0)118 938 

 To access trading company registration and address details, please go to
 the Vernalis website at www.vernalis.com and click on the Company
 address and registration details link at the bottom of the page..
 __

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread Andrew Dalke
On Oct 25, 2013, at 10:11 AM, Roger Sayle wrote:
 The use of an integer file format flavor argument allows the caller  
 to customize the behavior of the readers and writers.  The semantics
 is that a reasonable default is zero (for all bits), but that new
 features may be added without changing the API/ABI.

For some background, this is the API style used by OpenEye's
high-level readers and writers. There's more explanation at:

http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/molreadwrite.html#flavored-input-and-output

It solves a difficult problem, which is that there is no
such thing as the PDB format. (For that matter, there are
also variations of the MDL format, if only because the
output writer could use V3000 format for all cases, vs. V3000
only when V2000 can't support the structure.)

RDKit also supports different input and output flavors, though
it uses parameter attributes, like sanitize=False or
removeHs=False for reading an SD file.

OEChem's interface is more generic, in that the single 'flavor'
parameter exists for the high-level readers, which is easier
to pass around in a C++ toolkit.

(OTOH, this is less important for Python code. In chemfp, I
just pass around a Python dictionary of kwargs and apply
it like: SDMolSupplier(filename, **kwargs). )


However, these integer flags are tricky to use in practice.

For example, if you see flavor=49, what does it mean? Few
people will be able to look at that number and know it's:

  bit  1 = Write MODEL/ENDMDL lines around each record
  bit 16 = Write MASTER record
  bit 32 = Write TER record

For OEChem support, I ended up writing my own conversion
routines between the integer and a string notation. After
all, I would rather people do:

  rdkit2fps input.pdb --flavor MASTER|MODEL|TER

than have to do bitwise or-ing themselves for:

  rdkit2fps input.pdb --flavor 49


Bitflags also don't mix well with non-binary states.
Consider an SD file writer which supports a three-state option:
 - only V2000 output (ignore or generate corrupt records otherwise?)
 - V3000 output if required, otherwise V2000
 - always V3000

It's of course possible to encode this using 2 bits, but it
loses some of its elegance.

Think though of RDKit's SMILES file reader. It supports a
'delimiter' option, in order to support space, tab, comma,
and I presume other delimiters as well. It also supports
the ability to say that the SMILES come from something other
than the first column, and the SMILES from other than the
second.

These are even harder to encode in a single flavor.

BTW, OEChem doesn't support a delimiter option. Their 'SMILES
file' comes from the Daylight practice of

  SMILES + whitespace + rest_of_line_as_title

vs. the RDKit practice of assuming the file is a set of
delimited columns, with a possible header.


Above Roger said above that a reasonable default is zero (for all
bits), but that new features may be added without changing
the API/ABI.

Most file format work nicely with binary flags, as OEChem's
practice well shows. Some do not, as RDKit's SMILES file
format suggests.

There are other possible APIs which can handle the requirement of
supporting new features without changing the API/ABI.

RDKit's current method, that of passing additional arguments
to the function or constructor, is not scalable. I may have
multiple layers before I get to the actual reader or writer,
and I don't want to update the intermediate APIs every time
something changes.

I think it's very interesting that OEChem's new InChI
support (added only recently, so Roger might not know about
it), takes an InChIOptions object.

http://www.eyesopen.com/docs/toolkits/current/html/OEChem_TK-python/OEChemClasses/OEInChIOptions.html

OEInChIOptions(unsigned int flavor = OEOFlavor::INCHI::Default)

with methods like:
  .GetChiral()
  .GetFixedHLayer()
 ...
  .SetChiral()
  .SetFixedHLayer()
 ...

I don't know why they switched to this style for this case.
I wonder if part of it was to insulate themselves from any
odd specifications InChI might add in the future.

I prefer this style - an instance which contains the different
parameters - though I haven't used it in earnest.

This style too has difficulties, especially in C++. Ideally
you want to support programs which support, say, version 2013
(without a given feature and associated method) and version
2014 (without). You can't do that in a language like C++ which
requires all methods to be resolved in order for the program
to run.

The XMLReader API supports a 'getFeature(name)' and associated
'getProperty()'/'setProperty()', which might provide the right
generic API.

That said, you should read my email as commentary, and not
as a statement for or against the current code. While I don't
like it that much; without doubt, bit flags do work for this
task. And because of C++ overloading, there's also a migration
path to support an options class API like I promoted just now.


Andrew

Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-25 Thread ro...@nextmovesoftware.com

Hi James,

There's something very strange going on here with PyMol.

On Oct 25, 2013, at 1:09 PM, James Davidson wrote:
 I can't see any double-bonds rendered in PyMOL:
 CONECT344   10

Here atom 3 has two bonds to atom 4.  Why isn't it displayed double?

 This PDB *will* render double bonds in PyMOL.
 CONECT344

As expected.

 (and, again, I also see double bonds in PyMOL).
 CONECT324   10

No explicit double bond.  Where is the double bond coming from?


I'd expect two of the above cases to show double bonds, and one to  
only have
single bonds.  What is confusing is that which is which doesn't make  
any sense.


 Can you (or Greg) post a list of what the current input flavors do?

Currently the reader only has a single flavor...
flavor  1 : Read alternate locations, XPLOR/NMR pseudo atoms, and PDB  
dummy residues.

By default the PDB file reader only returns atoms with alternate  
locations fields
of space, 'A' or '1'.  It also ignores atoms with co-ordinates  
.000, .000, .000
that appear in XPLOR output for leaving group atoms in covalently  
bonded ligands.
Likewise, atoms with atomic symbol  Q which are typically dummy  
atoms used as
refinement constraints in NMR refinement.

If the flavor parameter has the value 1, all these pseudo-atoms are  
read into
the RDKit::ROMol, but clearly their semantics isn't understood by the  
rest of the
toolkit.  Valences will be incorrect, and a protein with multiple  
alternate sidechain
conformations for some will likely fail sanitization.



I hope this helps.

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science  
Park, Cambridge CB4 0EY


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of Q3 2013 release available

2013-10-24 Thread James Davidson
Hi Greg (et al.),

Thanks for the beta!  I have been going through some of the recently-added 
functionality, and had a couple of questions regarding the PDB reading / 
writing.


1.   Do I remember correctly that there was a proposal (from Roger) to add 
some auto bond-type perception to the PDB parser for ligands (or is that just 
wishful thinking!)?

2.   If not, I notice that there is an AssignBondOrdersFromTemplate() 
method - but the example in the doc-string only shows (I think) the case where 
the input PDB is just a single small molecule - so the matching is pretty easy! 
 I think a more real-World case is when one wants to set the bond orders for 
multiple ligands (HETATM residues) based on substructure matches - which will 
then return an atom index selection that can be used as a start point.  Is 
there any way to have the AssignBondOrdersFromTemplate() convenience function 
optionally accept a list of atom indexes to specify a substructure?

3.   Is there some explanation for what the 'flavor' option does for 
reading/writing PDB?

4.   Having read in a PDB file I see the correct atoms flagged as HETATM 
(from GetIsHeteroAtom()).  But when call Chem.MolToPDBBlock() these atoms get 
written as ATOM records...  Also, a Chem.MolToPDBFile() method would be nice 
for completeness / symmetry : )

5.   It seems to me that GetResidueNumber() and GetSerialNumber() may have 
got mixed-up at some point(?).  At least, when I call GetSerialNumber() I see 
what appears to be the residue number; and when I call GetResidueNumber() I get 
0!

6.   I also seem to be seeing all of the bonds (for all residues) being 
written out in CONECT records - such that they all appear as single bonds in eg 
PyMOL - is this expected behaviour at the moment?

Cheers

James



__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss