Re: [Rdkit-discuss] reading multiple conformers from file
On 15 August 2017 at 12:05, Greg Landrumwrote: > Nope, just the discussion that you see there. > Doing the work stalled on people (well, at least me) not having time to > actually do it. :-( > > I still think it would be interesting, but it seems unlikely that it will > get done unless someone else can make the time. > > -greg > > > Alright, then I post my - perhaps non-perfect - function to load a multi-conformer sdf file. import collections, re def tree(): # function to create multidimensional dictionaries return collections.defaultdict(tree) def load_multiconf_sdf(sdf_file, get_molnames=False, keep_iso=True): """ FUNCTION that distinguishes the different isomers of each compound in an .sdf file, and loads them along with their conformers into separate Chem.rdchem.Mol() objects (all the conformers are in the same mol). As of October 2016, RDKit did not support multi-conformer file reader, therefore I store each conformer loaded from the sdf file as a separate mol. Then I check if the molname and SMILES string already exist in the multidict and if yes, I add a new conformer to the existing molecule. SMILES strings are used to distinguish the various tautomerization/ionization states. ARGS: sdf_file: multi-conformer sdf file. The different tautomers/ionization states of each molecule must have the suffix "_iso[0-9]" in their property "_Name". get_molnames:return a list of the molecule names in the sdf file keep_iso: keep the "iso_[0-9]" suffix in the molname. Use with caution because if False only one of the tautomers/ionization states will be saved. RETURNS: molname_SMILES_conformersMol_multidict:multi-dimensional dictionary storing the following information: molname->SMILES string->mol object with multiples conformers molnames_list:if get_molnames=True, it also returns a list of the molnames of the molecules in the sdf file. """ print "Loading multi-conformer .sdf file " + sdf_file molname_SMILES_conformersMol_multidict = tree()# molname->SMILES->Chem.rdchem.Mol() object containing all the conformers of this compound suppl = Chem.SDMolSupplier(sdf_file, removeHs = False) molnames_list = [] for mol in suppl: if mol == None or mol.GetNumAtoms() == 0: continue # skip empty molecules molname = mol.GetProp('_Name').lower() print "reading ", molname, "from file", sdf_file if keep_iso == False: molname = re.sub(r"_iso[0-9]+", "", molname) props = [p for p in mol.GetPropNames(True, True)] if 'SMILES' in props: # distinguish the isomers and protonation states by the canonical SMILES string SMILES = mol.GetProp('SMILES') # syntax correct? else: # if not present in the inpust structure files, compute it SMILES = Chem.MolToSmiles(mol, isomericSmiles=True, canonical=True, allBondsExplicit=True) mol.SetProp('SMILES', SMILES) try: molname_SMILES_conformersMol_multidict[molname][SMILES].AddConformer(mol.GetConformer()) except (AttributeError, KeyError): molname_SMILES_conformersMol_multidict[molname][SMILES] = mol # molname_SMILES_conformersMol_multidict[molname][SMILES].AddConformer(mol.GetConformer()) # this add a replicate of mol (WRONG!!!??) molnames_list.append(molname) if get_molnames: return molname_SMILES_conformersMol_multidict, molnames_list else: > return molname_SMILES_conformersMol_multidict -- == Dr Thomas Evangelidis Post-doctoral Researcher CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/2S049, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
Nope, just the discussion that you see there. Doing the work stalled on people (well, at least me) not having time to actually do it. :-( I still think it would be interesting, but it seems unlikely that it will get done unless someone else can make the time. -greg On Mon, Aug 14, 2017 at 6:25 PM, Thomas Evangelidiswrote: > Hello, > > I was just wondering, has there been any progress on the multi-conformer > sdf file reader since last year? > > best > Thomas > > > On 27 October 2016 at 05:20, Greg Landrum wrote: > >> Hi Thomas, >> >> You're right, reading multiple conformations out of an SDF does seem like >> one of those common operations. Unfortunately the RDKit does not currently >> support it in an easy way. >> >> A python implementation of this would be a good topic for Friday's UGM >> hackathon, we can see if anyone finds it interesting enough to work on. >> >> -greg >> >> >> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis >> wrote: >> >>> Hello everyone, >>> >>> I am a new user of RDkit and I was looking in the documentation for an >>> easy way to load multiple conformers from a structure file like .sdf. The >>> code must 1) distinguish between different protonation states of the same >>> molecule, 2) create a new Mol() object for each protonation state and load >>> into it the respective conformers. >>> >>> Apparently I can work out a solution for 1) >>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other >>> properties, but I was wondering if there is any more straight forward way >>> to do it. >>> For 2) I guess I must iterate over all molecules in the input file, >>> create new Mol() objects (one for each protonation state of each ligand) >>> and add conformers to these new Mol() objects. Again this sounds easily >>> programmable, but sounds like a very common operation, thus I was wondering >>> if it has been implemented in a function. >>> >>> thanks in advance >>> Thomas >>> >>> >>> -- >>> >>> == >>> >>> Thomas Evangelidis >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tev...@pharm.uoa.gr >>> >>> teva...@gmail.com >>> >>> >>> website: https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> >>> >>> -- >>> The Command Line: Reinvented for Modern Developers >>> Did the resurgence of CLI tooling catch you by surprise? >>> Reconnect with the command line and become more productive. >>> Learn the new .NET and ASP.NET CLI. Get your free copy! >>> http://sdm.link/telerik >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > > == > > Dr Thomas Evangelidis > > Post-doctoral Researcher > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/2S049, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > > teva...@gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
Hello, I was just wondering, has there been any progress on the multi-conformer sdf file reader since last year? best Thomas On 27 October 2016 at 05:20, Greg Landrumwrote: > Hi Thomas, > > You're right, reading multiple conformations out of an SDF does seem like > one of those common operations. Unfortunately the RDKit does not currently > support it in an easy way. > > A python implementation of this would be a good topic for Friday's UGM > hackathon, we can see if anyone finds it interesting enough to work on. > > -greg > > > On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis > wrote: > >> Hello everyone, >> >> I am a new user of RDkit and I was looking in the documentation for an >> easy way to load multiple conformers from a structure file like .sdf. The >> code must 1) distinguish between different protonation states of the same >> molecule, 2) create a new Mol() object for each protonation state and load >> into it the respective conformers. >> >> Apparently I can work out a solution for 1) using mol.GetProp('_Name'), >> mol.GetNumAtoms, mol.GetNumBonds >> and other properties, but I was wondering if there is any more straight >> forward way to do it. >> For 2) I guess I must iterate over all molecules in the input file, >> create new Mol() objects (one for each protonation state of each ligand) >> and add conformers to these new Mol() objects. Again this sounds easily >> programmable, but sounds like a very common operation, thus I was wondering >> if it has been implemented in a function. >> >> thanks in advance >> Thomas >> >> >> -- >> >> == >> >> Thomas Evangelidis >> >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tev...@pharm.uoa.gr >> >> teva...@gmail.com >> >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> >> -- >> The Command Line: Reinvented for Modern Developers >> Did the resurgence of CLI tooling catch you by surprise? >> Reconnect with the command line and become more productive. >> Learn the new .NET and ASP.NET CLI. Get your free copy! >> http://sdm.link/telerik >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- == Dr Thomas Evangelidis Post-doctoral Researcher CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/2S049, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
Ok, let's start talking about this here: https://github.com/rdkit/rdkit/issues/1137 -greg On Mon, Oct 31, 2016 at 1:11 PM, Markus Sitzmannwrote: > +1 for a json format ... hmm, how about a general json-based molecular > structure format ... let us call it "cson" (that is an homage to Google > gson and Chemical Markup Language CML :-) > > Markus > > On Mon, Oct 31, 2016 at 11:18 AM, Brian Cole wrote: > >> I would 2nd the suggestion of continuing to push a JSON format forward >> that natively supports multiple conformers. >> >> I've never seen automatic recombination of an SDF work %100 of the time, >> it's fraught with corner cases. It's also abysmally slow and takes a huge >> amount of disk space. >> >> -Bruce >> >> On Oct 30, 2016, at 5:21 PM, Brian Kelley wrote: >> >> Rdkit already has a way to serialize conformers, the binary pickle format! >> >> Perhaps we should make a file extension for multiple molecules. Say >> ".rdk" and call it a day. Like inchi the source code is the reference :) >> >> >> Brian Kelley >> >> On Oct 27, 2016, at 2:05 AM, Greg Landrum wrote: >> >> The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys >> format. >> It's easy to imagine something better, but this is at least already there >> and there could be other software that speaks it: >> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/ >> FileParsers/test_data/cmpd2.tpl >> >> I'd still like to do a decent JSON format and adding multi-confs to that >> would be logical >> >> On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove < >> davidacosgrov...@gmail.com> wrote: >> >>> I've been wondering if, now that you can get decent conformations from >>> RDKit, it would be worth devising a multi-conformation file format to make >>> reading multi-conf molecules faster for vs purposes. In my experience, >>> pulling all the conformers out of an ascii file such as an sdf can become >>> the RDS for pharmacophore searchimg. Something to think about at the >>> hackathon maybe and certainly something that deserves a new email >>> thread. >>> >>> Dave >>> >>> >>> On Thursday, 27 October 2016, Greg Landrum >>> wrote: >>> Hi Thomas, You're right, reading multiple conformations out of an SDF does seem like one of those common operations. Unfortunately the RDKit does not currently support it in an easy way. A python implementation of this would be a good topic for Friday's UGM hackathon, we can see if anyone finds it interesting enough to work on. -greg On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis wrote: > Hello everyone, > > I am a new user of RDkit and I was looking in the documentation for an > easy way to load multiple conformers from a structure file like .sdf. The > code must 1) distinguish between different protonation states of the same > molecule, 2) create a new Mol() object for each protonation state and > load > into it the respective conformers. > > Apparently I can work out a solution for 1) > using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and > other properties, but I was wondering if there is any more straight > forward > way to do it. > For 2) I guess I must iterate over all molecules in the input file, > create new Mol() objects (one for each protonation state of each ligand) > and add conformers to these new Mol() objects. Again this sounds easily > programmable, but sounds like a very common operation, thus I was > wondering > if it has been implemented in a function. > > thanks in advance > Thomas > > > -- > > == > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > > teva...@gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > -- > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > >> >> -- >> The Command Line: Reinvented for Modern
Re: [Rdkit-discuss] reading multiple conformers from file
+1 for a json format ... hmm, how about a general json-based molecular structure format ... let us call it "cson" (that is an homage to Google gson and Chemical Markup Language CML :-) Markus On Mon, Oct 31, 2016 at 11:18 AM, Brian Colewrote: > I would 2nd the suggestion of continuing to push a JSON format forward > that natively supports multiple conformers. > > I've never seen automatic recombination of an SDF work %100 of the time, > it's fraught with corner cases. It's also abysmally slow and takes a huge > amount of disk space. > > -Bruce > > On Oct 30, 2016, at 5:21 PM, Brian Kelley wrote: > > Rdkit already has a way to serialize conformers, the binary pickle format! > > Perhaps we should make a file extension for multiple molecules. Say > ".rdk" and call it a day. Like inchi the source code is the reference :) > > > Brian Kelley > > On Oct 27, 2016, at 2:05 AM, Greg Landrum wrote: > > The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys > format. > It's easy to imagine something better, but this is at least already there > and there could be other software that speaks it: > https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_ > data/cmpd2.tpl > > I'd still like to do a decent JSON format and adding multi-confs to that > would be logical > > On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove < > davidacosgrov...@gmail.com> wrote: > >> I've been wondering if, now that you can get decent conformations from >> RDKit, it would be worth devising a multi-conformation file format to make >> reading multi-conf molecules faster for vs purposes. In my experience, >> pulling all the conformers out of an ascii file such as an sdf can become >> the RDS for pharmacophore searchimg. Something to think about at the >> hackathon maybe and certainly something that deserves a new email >> thread. >> >> Dave >> >> >> On Thursday, 27 October 2016, Greg Landrum >> wrote: >> >>> Hi Thomas, >>> >>> You're right, reading multiple conformations out of an SDF does seem >>> like one of those common operations. Unfortunately the RDKit does not >>> currently support it in an easy way. >>> >>> A python implementation of this would be a good topic for Friday's UGM >>> hackathon, we can see if anyone finds it interesting enough to work on. >>> >>> -greg >>> >>> >>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis >>> wrote: >>> Hello everyone, I am a new user of RDkit and I was looking in the documentation for an easy way to load multiple conformers from a structure file like .sdf. The code must 1) distinguish between different protonation states of the same molecule, 2) create a new Mol() object for each protonation state and load into it the respective conformers. Apparently I can work out a solution for 1) using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was wondering if there is any more straight forward way to do it. For 2) I guess I must iterate over all molecules in the input file, create new Mol() objects (one for each protonation state of each ligand) and add conformers to these new Mol() objects. Again this sounds easily programmable, but sounds like a very common operation, thus I was wondering if it has been implemented in a function. thanks in advance Thomas -- == Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> > > -- > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net >
Re: [Rdkit-discuss] reading multiple conformers from file
Rdkit already has a way to serialize conformers, the binary pickle format! Perhaps we should make a file extension for multiple molecules. Say ".rdk" and call it a day. Like inchi the source code is the reference :) Brian Kelley > On Oct 27, 2016, at 2:05 AM, Greg Landrumwrote: > > The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys format. > It's easy to imagine something better, but this is at least already there and > there could be other software that speaks it: > https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_data/cmpd2.tpl > > I'd still like to do a decent JSON format and adding multi-confs to that > would be logical > >> On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove >> wrote: >> I've been wondering if, now that you can get decent conformations from >> RDKit, it would be worth devising a multi-conformation file format to make >> reading multi-conf molecules faster for vs purposes. In my experience, >> pulling all the conformers out of an ascii file such as an sdf can become >> the RDS for pharmacophore searchimg. Something to think about at the >> hackathon maybe and certainly something that deserves a new email thread. >> >> Dave >> >> >>> On Thursday, 27 October 2016, Greg Landrum wrote: >>> Hi Thomas, >>> >>> You're right, reading multiple conformations out of an SDF does seem like >>> one of those common operations. Unfortunately the RDKit does not currently >>> support it in an easy way. >>> >>> A python implementation of this would be a good topic for Friday's UGM >>> hackathon, we can see if anyone finds it interesting enough to work on. >>> >>> -greg >>> >>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis wrote: Hello everyone, I am a new user of RDkit and I was looking in the documentation for an easy way to load multiple conformers from a structure file like .sdf. The code must 1) distinguish between different protonation states of the same molecule, 2) create a new Mol() object for each protonation state and load into it the respective conformers. Apparently I can work out a solution for 1) using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was wondering if there is any more straight forward way to do it. For 2) I guess I must iterate over all molecules in the input file, create new Mol() objects (one for each protonation state of each ligand) and add conformers to these new Mol() objects. Again this sounds easily programmable, but sounds like a very common operation, thus I was wondering if it has been implemented in a function. thanks in advance Thomas -- == Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> > > -- > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
It would seem that a major issue with RDKit's multiconformer file is the inability to associate structure-level and atom-level properties with conformations. t's not quite orthogonal to the question of how to read, say, a multiconformer SD file into RDKit's multiconformer format, because the conformers in said SD file could contain such properties, and information would be lost. -P. On Thu, Oct 27, 2016 at 6:20 AM, Thomas Evangelidiswrote: > Hello Greg, > > Is the canonical SMILES string always unique for every isomer and > tautomerization state of a molecule? If yes, then I have already written a > function to load multiple molecules and their conformers, which I can share > it here. > > best > Thomas > > PS: thanks to David for pointing this out. > > > > On 27 October 2016 at 05:20, Greg Landrum wrote: > >> Hi Thomas, >> >> You're right, reading multiple conformations out of an SDF does seem like >> one of those common operations. Unfortunately the RDKit does not currently >> support it in an easy way. >> >> A python implementation of this would be a good topic for Friday's UGM >> hackathon, we can see if anyone finds it interesting enough to work on. >> >> -greg >> >> >> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis >> wrote: >> >>> Hello everyone, >>> >>> I am a new user of RDkit and I was looking in the documentation for an >>> easy way to load multiple conformers from a structure file like .sdf. The >>> code must 1) distinguish between different protonation states of the same >>> molecule, 2) create a new Mol() object for each protonation state and load >>> into it the respective conformers. >>> >>> Apparently I can work out a solution for 1) >>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other >>> properties, but I was wondering if there is any more straight forward way >>> to do it. >>> For 2) I guess I must iterate over all molecules in the input file, >>> create new Mol() objects (one for each protonation state of each ligand) >>> and add conformers to these new Mol() objects. Again this sounds easily >>> programmable, but sounds like a very common operation, thus I was wondering >>> if it has been implemented in a function. >>> >>> thanks in advance >>> Thomas >>> >>> >>> -- >>> >>> == >>> >>> Thomas Evangelidis >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tev...@pharm.uoa.gr >>> >>> teva...@gmail.com >>> >>> >>> website: https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> >>> >>> -- >>> The Command Line: Reinvented for Modern Developers >>> Did the resurgence of CLI tooling catch you by surprise? >>> Reconnect with the command line and become more productive. >>> Learn the new .NET and ASP.NET CLI. Get your free copy! >>> http://sdm.link/telerik >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > > == > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > > teva...@gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > -- > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
Hello Greg, Is the canonical SMILES string always unique for every isomer and tautomerization state of a molecule? If yes, then I have already written a function to load multiple molecules and their conformers, which I can share it here. best Thomas PS: thanks to David for pointing this out. On 27 October 2016 at 05:20, Greg Landrumwrote: > Hi Thomas, > > You're right, reading multiple conformations out of an SDF does seem like > one of those common operations. Unfortunately the RDKit does not currently > support it in an easy way. > > A python implementation of this would be a good topic for Friday's UGM > hackathon, we can see if anyone finds it interesting enough to work on. > > -greg > > > On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis > wrote: > >> Hello everyone, >> >> I am a new user of RDkit and I was looking in the documentation for an >> easy way to load multiple conformers from a structure file like .sdf. The >> code must 1) distinguish between different protonation states of the same >> molecule, 2) create a new Mol() object for each protonation state and load >> into it the respective conformers. >> >> Apparently I can work out a solution for 1) using mol.GetProp('_Name'), >> mol.GetNumAtoms, mol.GetNumBonds >> and other properties, but I was wondering if there is any more straight >> forward way to do it. >> For 2) I guess I must iterate over all molecules in the input file, >> create new Mol() objects (one for each protonation state of each ligand) >> and add conformers to these new Mol() objects. Again this sounds easily >> programmable, but sounds like a very common operation, thus I was wondering >> if it has been implemented in a function. >> >> thanks in advance >> Thomas >> >> >> -- >> >> == >> >> Thomas Evangelidis >> >> Research Specialist >> CEITEC - Central European Institute of Technology >> Masaryk University >> Kamenice 5/A35/1S081, >> 62500 Brno, Czech Republic >> >> email: tev...@pharm.uoa.gr >> >> teva...@gmail.com >> >> >> website: https://sites.google.com/site/thomasevangelidishomepage/ >> >> >> >> -- >> The Command Line: Reinvented for Modern Developers >> Did the resurgence of CLI tooling catch you by surprise? >> Reconnect with the command line and become more productive. >> Learn the new .NET and ASP.NET CLI. Get your free copy! >> http://sdm.link/telerik >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- == Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys format. It's easy to imagine something better, but this is at least already there and there could be other software that speaks it: https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_data/cmpd2.tpl I'd still like to do a decent JSON format and adding multi-confs to that would be logical On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrovewrote: > I've been wondering if, now that you can get decent conformations from > RDKit, it would be worth devising a multi-conformation file format to make > reading multi-conf molecules faster for vs purposes. In my experience, > pulling all the conformers out of an ascii file such as an sdf can become > the RDS for pharmacophore searchimg. Something to think about at the > hackathon maybe and certainly something that deserves a new email thread. > > Dave > > > On Thursday, 27 October 2016, Greg Landrum wrote: > >> Hi Thomas, >> >> You're right, reading multiple conformations out of an SDF does seem like >> one of those common operations. Unfortunately the RDKit does not currently >> support it in an easy way. >> >> A python implementation of this would be a good topic for Friday's UGM >> hackathon, we can see if anyone finds it interesting enough to work on. >> >> -greg >> >> >> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis >> wrote: >> >>> Hello everyone, >>> >>> I am a new user of RDkit and I was looking in the documentation for an >>> easy way to load multiple conformers from a structure file like .sdf. The >>> code must 1) distinguish between different protonation states of the same >>> molecule, 2) create a new Mol() object for each protonation state and load >>> into it the respective conformers. >>> >>> Apparently I can work out a solution for 1) >>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other >>> properties, but I was wondering if there is any more straight forward way >>> to do it. >>> For 2) I guess I must iterate over all molecules in the input file, >>> create new Mol() objects (one for each protonation state of each ligand) >>> and add conformers to these new Mol() objects. Again this sounds easily >>> programmable, but sounds like a very common operation, thus I was wondering >>> if it has been implemented in a function. >>> >>> thanks in advance >>> Thomas >>> >>> >>> -- >>> >>> == >>> >>> Thomas Evangelidis >>> >>> Research Specialist >>> CEITEC - Central European Institute of Technology >>> Masaryk University >>> Kamenice 5/A35/1S081, >>> 62500 Brno, Czech Republic >>> >>> email: tev...@pharm.uoa.gr >>> >>> teva...@gmail.com >>> >>> >>> website: https://sites.google.com/site/thomasevangelidishomepage/ >>> >>> >>> >>> -- >>> The Command Line: Reinvented for Modern Developers >>> Did the resurgence of CLI tooling catch you by surprise? >>> Reconnect with the command line and become more productive. >>> Learn the new .NET and ASP.NET CLI. Get your free copy! >>> http://sdm.link/telerik >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] reading multiple conformers from file
Hi Thomas, You're right, reading multiple conformations out of an SDF does seem like one of those common operations. Unfortunately the RDKit does not currently support it in an easy way. A python implementation of this would be a good topic for Friday's UGM hackathon, we can see if anyone finds it interesting enough to work on. -greg On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidiswrote: > Hello everyone, > > I am a new user of RDkit and I was looking in the documentation for an > easy way to load multiple conformers from a structure file like .sdf. The > code must 1) distinguish between different protonation states of the same > molecule, 2) create a new Mol() object for each protonation state and load > into it the respective conformers. > > Apparently I can work out a solution for 1) using mol.GetProp('_Name'), > mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was > wondering if there is any more straight forward way to do it. > For 2) I guess I must iterate over all molecules in the input file, create > new Mol() objects (one for each protonation state of each ligand) and add > conformers to these new Mol() objects. Again this sounds easily > programmable, but sounds like a very common operation, thus I was wondering > if it has been implemented in a function. > > thanks in advance > Thomas > > > -- > > == > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > > teva...@gmail.com > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > -- > The Command Line: Reinvented for Modern Developers > Did the resurgence of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] reading multiple conformers from file
Hello everyone, I am a new user of RDkit and I was looking in the documentation for an easy way to load multiple conformers from a structure file like .sdf. The code must 1) distinguish between different protonation states of the same molecule, 2) create a new Mol() object for each protonation state and load into it the respective conformers. Apparently I can work out a solution for 1) using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was wondering if there is any more straight forward way to do it. For 2) I guess I must iterate over all molecules in the input file, create new Mol() objects (one for each protonation state of each ligand) and add conformers to these new Mol() objects. Again this sounds easily programmable, but sounds like a very common operation, thus I was wondering if it has been implemented in a function. thanks in advance Thomas -- == Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/ -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss