FYI: If anybody needs the code which gets all residues as separate
Molecules, here it is.

prot = Chem.MolFromPDBFile('10gs/10gs_protein_rdkit.pdb', flavor=1)
residues = []
aa = Chem.MolFromSmarts('NCC(=O)N')
for res in Chem.SplitMolByPDBResidues(prot).values():
    for frag in Chem.GetMolFrags(res, asMols=True, sanitizeFrags=False):
        # match a peptide bond
        peptide_bonds = [frag.GetBondBetweenAtoms(match[2],
match[4]).GetIdx() for match in frag.GetSubstructMatches(aa)]
        if peptide_bonds:
            disconnected_aa = Chem.FragmentOnBonds(frag, peptide_bonds,
addDummies=False)
            residues.extend(Chem.GetMolFrags(disconnected_aa, asMols=True,
sanitizeFrags=False))
        else:
            residues.append(frag)

The downside is that there is no atom map, thus the indices relation is
lost. This is why I stick to the original solution of grouping the atoms in
residues by "residue number + residue chain".

Implementing such grouping in similar way as SplitMolByPDBResidues/Chains
would also loose the atom mapping if I understand the RDKit code correctly.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2016-03-22 11:05 GMT+01:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:

> I correct myself, all residue types are available
> from Chem.SplitMolByPDBResidues().
>
> ----
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2016-03-22 9:50 GMT+01:00 Maciek Wójcikowski <mac...@wojcikowski.pl>:
>
>> Hi Greg,
>> 2016-03-22 6:28 GMT+01:00 Greg Landrum <greg.land...@gmail.com>:
>> >
>> > Hi Maciek,
>> >
>> >
>> > On Mon, Mar 21, 2016 at 8:33 PM, Maciek Wójcikowski <
>> mac...@wojcikowski.pl> wrote:
>> >>
>> >>
>> >> I came across one problem with RDKit today, namely Chem.PathToSubmol()
>> function. Does the "path" mean atom or bond indices? On this very list I
>> fount the examples showing usage with atom idx [
>> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03966.html],
>> while the example on "Getting started in python" is feeding
>> Chem.FindAtomEnvironmentOfRadiusN() which gives a list of bond indices. The
>> documentation could be more explicit here... After my brief analysis of the
>> code I found out that the bonds should be used (correct me if I'm wrong).
>> >
>> >
>> > The function is still not documented, but it's definitely bonds. I
>> think the thread you reference from the mailing list says the same thing.
>>
>> Ok, you're right I've just noticed your comment, while the example was
>> still using atom indices (although they worked for the sample mol -
>> fortunatelly aligned with atom indices).
>>
>> >
>> >
>> >>
>> >> So here comes the question: is there an equivalent function or a
>> clever way to do Chem.PathToSubmol() on atom indices? Currently I do: 1)
>> get the atom path; 2) get bonds between every atom in path (their indices);
>> 3) get submol with Chem.PathToSubmol()
>> >
>> >
>> > I don't think so.
>> >
>> >>
>> >> PS.
>> >> I use it to get each proteins residue (amino acid) in separate mol. It
>> would be much easier if we could use "Molecule -> Residues ->  Atoms"
>> instead of "Molecule -> Atoms -> (grouping of monomers) -> Residues".
>> >>
>> >
>> > SplitMolByPDBResidues() doesn't do what you want?
>> >
>> >
>>
>> Not really. I want to get each amino acid separately, so I'd have to do
>> SplitMolByPDBChainId() -> SplitMolByPDBResidues() -> break the peptide
>> bonds (to eliminate series of aa) -> split disconnected molecules. And that
>> only outputs valid PDB amino acids. Accessing non-standard ones, like HOH,
>> LIG, UNL, although present in PDB would be also desired. In other words the
>> unique key should be "monomer index + chain id" instead of only three
>> letter name as in SplitMolByPDBResidues().
>>
>> Maciek
>>
>> >
>> > -greg
>> >
>>
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to