[Rdkit-discuss] Keeping only parts of a molecule (a set of atom ids)
Hi there at RDKit, I have a set of atom indices from a molecule I want to keep, and any atom which is not in this list I want to discard. I thought of implementing this as follows: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) keep_atoms = [2,3,4] # assume these exist in the above as an example, you can print the atom ids to check edit_mol = Chem.EditableMol(mol) for atom in mol.GetAtoms(): if atom.GetIdx() not in keep_atoms: edit_mol.RemoveAtom(atom.GetIdx()) I am not sure this is the best implementation (also because it does not work), but it's a try. The end result should be an sdf file with only atoms 2,3,4 from the original molecule. When I run the above I get: [15:45:50] Range Error idx Violation occurred on line 143 in file /opt/RDKit_2011_12_1/Code/GraphMol/ROMol.cpp Failed Expression: 0 = 6 = 5 Traceback (most recent call last): File ./test.py, line 12, in module edit_mol.RemoveAtom(atom.GetIdx()) RuntimeError: Range Error I cannot quite understand this error. Can anyone shed some light? I mean this is related to me deleting 3 atoms from the molecule, so it somehow expects the range to be from 0 = x = 5 instead of 0 = x = 8... but why is there this check in place? Many Thanks - Jean-Paul Ebejer Early Stage Researcher -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Keeping only parts of a molecule (a set of atom ids)
And as a follow up - running this: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) edit_mol = Chem.EditableMol(mol) edit_mol.RemoveAtom(0) for atom in edit_mol.GetMol().GetAtoms(): print atom.GetIdx() gives seg fault... jp@xxx:~/tmp$ test.py Segmentation fault - Jean-Paul Ebejer Early Stage Researcher On 22 March 2012 15:52, JP jeanpaul.ebe...@inhibox.com wrote: Hi there at RDKit, I have a set of atom indices from a molecule I want to keep, and any atom which is not in this list I want to discard. I thought of implementing this as follows: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) keep_atoms = [2,3,4] # assume these exist in the above as an example, you can print the atom ids to check edit_mol = Chem.EditableMol(mol) for atom in mol.GetAtoms(): if atom.GetIdx() not in keep_atoms: edit_mol.RemoveAtom(atom.GetIdx()) I am not sure this is the best implementation (also because it does not work), but it's a try. The end result should be an sdf file with only atoms 2,3,4 from the original molecule. When I run the above I get: [15:45:50] Range Error idx Violation occurred on line 143 in file /opt/RDKit_2011_12_1/Code/GraphMol/ROMol.cpp Failed Expression: 0 = 6 = 5 Traceback (most recent call last): File ./test.py, line 12, in module edit_mol.RemoveAtom(atom.GetIdx()) RuntimeError: Range Error I cannot quite understand this error. Can anyone shed some light? I mean this is related to me deleting 3 atoms from the molecule, so it somehow expects the range to be from 0 = x = 5 instead of 0 = x = 8... but why is there this check in place? Many Thanks - Jean-Paul Ebejer Early Stage Researcher -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Keeping only parts of a molecule (a set of atom ids)
I had a similar problem to this when removing atoms from a molecule. When you remove an atom, the atoms IDs change, therefore resulting in a seg fault, or your atoms not being in the correct range. The way I got around this was to sort the IDs of the atoms I want to remove from highest to lowest, so the atoms with higher IDs are removed first, this will not affect the IDs of the atoms with lower IDs. Here is the relevant discussion http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01937.html Thanks, Sarah On 22 Mar 2012, at 15:56, JP wrote: And as a follow up - running this: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) edit_mol = Chem.EditableMol(mol) edit_mol.RemoveAtom(0) for atom in edit_mol.GetMol().GetAtoms(): print atom.GetIdx() gives seg fault... jp@xxx:~/tmp$ test.py Segmentation fault - Jean-Paul Ebejer Early Stage Researcher On 22 March 2012 15:52, JP jeanpaul.ebe...@inhibox.com wrote: Hi there at RDKit, I have a set of atom indices from a molecule I want to keep, and any atom which is not in this list I want to discard. I thought of implementing this as follows: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) keep_atoms = [2,3,4] # assume these exist in the above as an example, you can print the atom ids to check edit_mol = Chem.EditableMol(mol) for atom in mol.GetAtoms(): if atom.GetIdx() not in keep_atoms: edit_mol.RemoveAtom(atom.GetIdx()) I am not sure this is the best implementation (also because it does not work), but it's a try. The end result should be an sdf file with only atoms 2,3,4 from the original molecule. When I run the above I get: [15:45:50] Range Error idx Violation occurred on line 143 in file /opt/RDKit_2011_12_1/Code/GraphMol/ROMol.cpp Failed Expression: 0 = 6 = 5 Traceback (most recent call last): File ./test.py, line 12, in module edit_mol.RemoveAtom(atom.GetIdx()) RuntimeError: Range Error I cannot quite understand this error. Can anyone shed some light? I mean this is related to me deleting 3 atoms from the molecule, so it somehow expects the range to be from 0 = x = 5 instead of 0 = x = 8... but why is there this check in place? Many Thanks - Jean-Paul Ebejer Early Stage Researcher -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.-- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Keeping only parts of a molecule (a set of atom ids)
Hi JP, Sarah was right on the trick of deleting atoms in the descending order of atom index. Regarding the segment fault, this is very likely to be a memory management issue. Make sure you assign the result of the GetMol() call to a temporary variable to avoid the underlying object being released prematurely. from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) edit_mol = Chem.EditableMol(mol) edit_mol.RemoveAtom(0) m = edit_mol.GetMol() for atom in m.GetAtoms(): ... print atom.GetIdx() ... 0 1 2 3 4 5 6 7 Regards, Eddie On Mar 22, 2012, at 9:07 AM, Sarah Langdon wrote: I had a similar problem to this when removing atoms from a molecule. When you remove an atom, the atoms IDs change, therefore resulting in a seg fault, or your atoms not being in the correct range. The way I got around this was to sort the IDs of the atoms I want to remove from highest to lowest, so the atoms with higher IDs are removed first, this will not affect the IDs of the atoms with lower IDs. Here is the relevant discussion http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01937.html Thanks, Sarah On 22 Mar 2012, at 15:56, JP wrote: And as a follow up - running this: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) edit_mol = Chem.EditableMol(mol) edit_mol.RemoveAtom(0) for atom in edit_mol.GetMol().GetAtoms(): print atom.GetIdx() gives seg fault... jp@xxx:~/tmp$ test.py Segmentation fault - Jean-Paul Ebejer Early Stage Researcher On 22 March 2012 15:52, JP jeanpaul.ebe...@inhibox.com wrote: Hi there at RDKit, I have a set of atom indices from a molecule I want to keep, and any atom which is not in this list I want to discard. I thought of implementing this as follows: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) keep_atoms = [2,3,4] # assume these exist in the above as an example, you can print the atom ids to check edit_mol = Chem.EditableMol(mol) for atom in mol.GetAtoms(): if atom.GetIdx() not in keep_atoms: edit_mol.RemoveAtom(atom.GetIdx()) I am not sure this is the best implementation (also because it does not work), but it's a try. The end result should be an sdf file with only atoms 2,3,4 from the original molecule. When I run the above I get: [15:45:50] Range Error idx Violation occurred on line 143 in file /opt/RDKit_2011_12_1/Code/GraphMol/ROMol.cpp Failed Expression: 0 = 6 = 5 Traceback (most recent call last): File ./test.py, line 12, in module edit_mol.RemoveAtom(atom.GetIdx()) RuntimeError: Range Error I cannot quite understand this error. Can anyone shed some light? I mean this is related to me deleting 3 atoms from the molecule, so it somehow expects the range to be from 0 = x = 5 instead of 0 = x = 8... but why is there this check in place? Many Thanks - Jean-Paul Ebejer Early Stage Researcher -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Keeping only parts of a molecule (a set of atom ids)
Thanks to both of you, nice trick... - Jean-Paul Ebejer Early Stage Researcher On 22 March 2012 16:34, Eddie Cao cao.yi...@gmail.com wrote: Hi JP, Sarah was right on the trick of deleting atoms in the descending order of atom index. Regarding the segment fault, this is very likely to be a memory management issue. Make sure you assign the result of the GetMol() call to a temporary variable to avoid the underlying object being released prematurely. from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) edit_mol = Chem.EditableMol(mol) edit_mol.RemoveAtom(0) m = edit_mol.GetMol() for atom in m.GetAtoms(): ... print atom.GetIdx() ... 0 1 2 3 4 5 6 7 Regards, Eddie On Mar 22, 2012, at 9:07 AM, Sarah Langdon wrote: I had a similar problem to this when removing atoms from a molecule. When you remove an atom, the atoms IDs change, therefore resulting in a seg fault, or your atoms not being in the correct range. The way I got around this was to sort the IDs of the atoms I want to remove from highest to lowest, so the atoms with higher IDs are removed first, this will not affect the IDs of the atoms with lower IDs. Here is the relevant discussion http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01937.html Thanks, Sarah On 22 Mar 2012, at 15:56, JP wrote: And as a follow up - running this: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) edit_mol = Chem.EditableMol(mol) edit_mol.RemoveAtom(0) for atom in edit_mol.GetMol().GetAtoms(): print atom.GetIdx() gives seg fault... jp@xxx:~/tmp$ test.py Segmentation fault - Jean-Paul Ebejer Early Stage Researcher On 22 March 2012 15:52, JP jeanpaul.ebe...@inhibox.com wrote: Hi there at RDKit, I have a set of atom indices from a molecule I want to keep, and any atom which is not in this list I want to discard. I thought of implementing this as follows: #!/usr/bin/env python from rdkit import Chem mol = Chem.MolFromSmiles(CCC1CNCC1CC) keep_atoms = [2,3,4] # assume these exist in the above as an example, you can print the atom ids to check edit_mol = Chem.EditableMol(mol) for atom in mol.GetAtoms(): if atom.GetIdx() not in keep_atoms: edit_mol.RemoveAtom(atom.GetIdx()) I am not sure this is the best implementation (also because it does not work), but it's a try. The end result should be an sdf file with only atoms 2,3,4 from the original molecule. When I run the above I get: [15:45:50] Range Error idx Violation occurred on line 143 in file /opt/RDKit_2011_12_1/Code/GraphMol/ROMol.cpp Failed Expression: 0 = 6 = 5 Traceback (most recent call last): File ./test.py, line 12, in module edit_mol.RemoveAtom(atom.GetIdx()) RuntimeError: Range Error I cannot quite understand this error. Can anyone shed some light? I mean this is related to me deleting 3 atoms from the molecule, so it somehow expects the range to be from 0 = x = 5 instead of 0 = x = 8... but why is there this check in place? Many Thanks - Jean-Paul Ebejer Early Stage Researcher -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss