Re: [Rdkit-discuss] reading multiple conformers from file

2017-08-15 Thread Thomas Evangelidis
On 15 August 2017 at 12:05, Greg Landrum  wrote:

> Nope, just the discussion that you see there.
> Doing the work stalled on people (well, at least me) not having time to
> actually do it. :-(
>
> I still think it would be interesting, but it seems unlikely that it will
> get done unless someone else can make the time.
>
> -greg
>
>
>

Alright, then I post my - perhaps non-perfect - function to load a
multi-conformer sdf file.



import collections, re
def tree(): # function to create multidimensional dictionaries
return collections.defaultdict(tree)



def load_multiconf_sdf(sdf_file, get_molnames=False, keep_iso=True):
"""
FUNCTION that distinguishes the different isomers of each compound
in an .sdf file, and loads them along with their conformers
into separate Chem.rdchem.Mol() objects (all the conformers are in
the same mol). As of October 2016, RDKit did not support multi-conformer
file reader, therefore I store each
conformer loaded from the sdf file as a separate mol. Then I check
if the molname and SMILES string already exist in the multidict
and if yes, I add a new conformer to the existing molecule. SMILES
strings are used to distinguish the various tautomerization/ionization
states.

ARGS:
sdf_file:   multi-conformer sdf file. The different
tautomers/ionization states of each molecule must have the suffix
"_iso[0-9]" in their property "_Name".
get_molnames:return a list of the molecule names in the sdf file
keep_iso:   keep the "iso_[0-9]" suffix in the molname. Use with
caution because if False only one of the tautomers/ionization states will
be saved.
RETURNS:
molname_SMILES_conformersMol_multidict:multi-dimensional dictionary
storing the following information: molname->SMILES string->mol object with
multiples conformers
molnames_list:if get_molnames=True, it also returns a list of the
molnames of the molecules in the sdf file.
"""
print "Loading multi-conformer .sdf file " + sdf_file
molname_SMILES_conformersMol_multidict = tree()#
molname->SMILES->Chem.rdchem.Mol() object containing all the conformers of
this compound
suppl = Chem.SDMolSupplier(sdf_file, removeHs = False)
molnames_list = []
for mol in suppl:
if mol == None or mol.GetNumAtoms() == 0:
continue # skip empty molecules
molname = mol.GetProp('_Name').lower()
print "reading ", molname, "from file", sdf_file
if keep_iso == False:
molname = re.sub(r"_iso[0-9]+", "", molname)
props = [p for p in mol.GetPropNames(True, True)]
if 'SMILES' in props:   # distinguish the isomers and protonation
states by the canonical SMILES string
SMILES = mol.GetProp('SMILES')  # syntax correct?
else:   # if not present in the inpust structure files, compute it
SMILES = Chem.MolToSmiles(mol, isomericSmiles=True,
canonical=True, allBondsExplicit=True)
mol.SetProp('SMILES', SMILES)
try:

molname_SMILES_conformersMol_multidict[molname][SMILES].AddConformer(mol.GetConformer())
except (AttributeError, KeyError):
molname_SMILES_conformersMol_multidict[molname][SMILES] = mol
#
molname_SMILES_conformersMol_multidict[molname][SMILES].AddConformer(mol.GetConformer())
# this add a replicate of mol (WRONG!!!??)
molnames_list.append(molname)

if get_molnames:
return molname_SMILES_conformersMol_multidict, molnames_list
else:

> return molname_SMILES_conformersMol_multidict









-- 

==

Dr Thomas Evangelidis

Post-doctoral Researcher
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/2S049,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

  teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2017-08-15 Thread Greg Landrum
Nope, just the discussion that you see there.
Doing the work stalled on people (well, at least me) not having time to
actually do it. :-(

I still think it would be interesting, but it seems unlikely that it will
get done unless someone else can make the time.

-greg


On Mon, Aug 14, 2017 at 6:25 PM, Thomas Evangelidis 
wrote:

> Hello,
>
> I was just wondering, has there been any progress on the multi-conformer
> sdf file reader since last year?
>
> best
> Thomas
>
>
> On 27 October 2016 at 05:20, Greg Landrum  wrote:
>
>> Hi Thomas,
>>
>> You're right, reading multiple conformations out of an SDF does seem like
>> one of those common operations. Unfortunately the RDKit does not currently
>> support it in an easy way.
>>
>> A python implementation of this would be a good topic for Friday's UGM
>> hackathon, we can see if anyone finds it interesting enough to work on.
>>
>> -greg
>>
>>
>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I am a new user of RDkit and I was looking in the documentation for an
>>> easy way to load multiple conformers from a structure file like .sdf. The
>>> code must 1) distinguish between different protonation states of the same
>>> molecule,  2) create a new Mol() object for each protonation state and load
>>> into it the respective conformers.
>>>
>>> Apparently I can work out a solution for 1)
>>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other
>>> properties, but I was wondering if there is any more straight forward way
>>> to do it.
>>> For 2) I guess I must iterate over all molecules in the input file,
>>> create new Mol() objects (one for each protonation state of each ligand)
>>> and add conformers to these new Mol() objects. Again this sounds easily
>>> programmable, but sounds like a very common operation, thus I was wondering
>>> if it has been implemented in a function.
>>>
>>> thanks in advance
>>> Thomas
>>>
>>>
>>> --
>>>
>>> ==
>>>
>>> Thomas Evangelidis
>>>
>>> Research Specialist
>>> CEITEC - Central European Institute of Technology
>>> Masaryk University
>>> Kamenice 5/A35/1S081,
>>> 62500 Brno, Czech Republic
>>>
>>> email: tev...@pharm.uoa.gr
>>>
>>>   teva...@gmail.com
>>>
>>>
>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>> 
>>> --
>>> The Command Line: Reinvented for Modern Developers
>>> Did the resurgence of CLI tooling catch you by surprise?
>>> Reconnect with the command line and become more productive.
>>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>>> http://sdm.link/telerik
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
>
> ==
>
> Dr Thomas Evangelidis
>
> Post-doctoral Researcher
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/2S049,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2017-08-14 Thread Thomas Evangelidis
Hello,

I was just wondering, has there been any progress on the multi-conformer
sdf file reader since last year?

best
Thomas


On 27 October 2016 at 05:20, Greg Landrum  wrote:

> Hi Thomas,
>
> You're right, reading multiple conformations out of an SDF does seem like
> one of those common operations. Unfortunately the RDKit does not currently
> support it in an easy way.
>
> A python implementation of this would be a good topic for Friday's UGM
> hackathon, we can see if anyone finds it interesting enough to work on.
>
> -greg
>
>
> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
> wrote:
>
>> Hello everyone,
>>
>> I am a new user of RDkit and I was looking in the documentation for an
>> easy way to load multiple conformers from a structure file like .sdf. The
>> code must 1) distinguish between different protonation states of the same
>> molecule,  2) create a new Mol() object for each protonation state and load
>> into it the respective conformers.
>>
>> Apparently I can work out a solution for 1) using mol.GetProp('_Name'), 
>> mol.GetNumAtoms, mol.GetNumBonds
>> and other properties, but I was wondering if there is any more straight
>> forward way to do it.
>> For 2) I guess I must iterate over all molecules in the input file,
>> create new Mol() objects (one for each protonation state of each ligand)
>> and add conformers to these new Mol() objects. Again this sounds easily
>> programmable, but sounds like a very common operation, thus I was wondering
>> if it has been implemented in a function.
>>
>> thanks in advance
>> Thomas
>>
>>
>> --
>>
>> ==
>>
>> Thomas Evangelidis
>>
>> Research Specialist
>> CEITEC - Central European Institute of Technology
>> Masaryk University
>> Kamenice 5/A35/1S081,
>> 62500 Brno, Czech Republic
>>
>> email: tev...@pharm.uoa.gr
>>
>>   teva...@gmail.com
>>
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>> 
>> --
>> The Command Line: Reinvented for Modern Developers
>> Did the resurgence of CLI tooling catch you by surprise?
>> Reconnect with the command line and become more productive.
>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>> http://sdm.link/telerik
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>


-- 

==

Dr Thomas Evangelidis

Post-doctoral Researcher
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/2S049,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

  teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-31 Thread Greg Landrum
Ok, let's start talking about this here:
https://github.com/rdkit/rdkit/issues/1137

-greg


On Mon, Oct 31, 2016 at 1:11 PM, Markus Sitzmann 
wrote:

> +1 for a json format ... hmm, how about a general json-based molecular
> structure format ... let us call it "cson" (that is an homage to Google
> gson and Chemical Markup Language CML :-)
>
> Markus
>
> On Mon, Oct 31, 2016 at 11:18 AM, Brian Cole  wrote:
>
>> I would 2nd the suggestion of continuing to push a JSON format forward
>> that natively supports multiple conformers.
>>
>> I've never seen automatic recombination of an SDF work %100 of the time,
>> it's fraught with corner cases. It's also abysmally slow and takes a huge
>> amount of disk space.
>>
>> -Bruce
>>
>> On Oct 30, 2016, at 5:21 PM, Brian Kelley  wrote:
>>
>> Rdkit already has a way to serialize conformers, the binary pickle format!
>>
>> Perhaps we should make a file extension for multiple molecules.  Say
>> ".rdk" and call it a day.   Like inchi the source code is the reference  :)
>>
>> 
>> Brian Kelley
>>
>> On Oct 27, 2016, at 2:05 AM, Greg Landrum  wrote:
>>
>> The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys
>> format.
>> It's easy to imagine something better, but this is at least already there
>> and there could be other software that speaks it:
>> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/
>> FileParsers/test_data/cmpd2.tpl
>>
>> I'd still like to do a decent JSON format and adding multi-confs to that
>> would be logical
>>
>> On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove <
>> davidacosgrov...@gmail.com> wrote:
>>
>>> I've been wondering if, now that you can get decent conformations from
>>> RDKit, it would be worth devising a multi-conformation file format to make
>>> reading multi-conf molecules faster for vs purposes. In my experience,
>>> pulling all the conformers out of an ascii file such as an sdf can become
>>> the RDS for pharmacophore searchimg. Something to think about at the
>>> hackathon maybe and certainly something that deserves a new email
>>> thread.
>>>
>>> Dave
>>>
>>>
>>> On Thursday, 27 October 2016, Greg Landrum 
>>> wrote:
>>>
 Hi Thomas,

 You're right, reading multiple conformations out of an SDF does seem
 like one of those common operations. Unfortunately the RDKit does not
 currently support it in an easy way.

 A python implementation of this would be a good topic for Friday's UGM
 hackathon, we can see if anyone finds it interesting enough to work on.

 -greg


 On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
 wrote:

> Hello everyone,
>
> I am a new user of RDkit and I was looking in the documentation for an
> easy way to load multiple conformers from a structure file like .sdf. The
> code must 1) distinguish between different protonation states of the same
> molecule,  2) create a new Mol() object for each protonation state and 
> load
> into it the respective conformers.
>
> Apparently I can work out a solution for 1)
> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and
> other properties, but I was wondering if there is any more straight 
> forward
> way to do it.
> For 2) I guess I must iterate over all molecules in the input file,
> create new Mol() objects (one for each protonation state of each ligand)
> and add conformers to these new Mol() objects. Again this sounds easily
> programmable, but sounds like a very common operation, thus I was 
> wondering
> if it has been implemented in a function.
>
> thanks in advance
> Thomas
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>

>> 
>> --
>> The Command Line: Reinvented for Modern 

Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-31 Thread Markus Sitzmann
+1 for a json format ... hmm, how about a general json-based molecular
structure format ... let us call it "cson" (that is an homage to Google
gson and Chemical Markup Language CML :-)

Markus

On Mon, Oct 31, 2016 at 11:18 AM, Brian Cole  wrote:

> I would 2nd the suggestion of continuing to push a JSON format forward
> that natively supports multiple conformers.
>
> I've never seen automatic recombination of an SDF work %100 of the time,
> it's fraught with corner cases. It's also abysmally slow and takes a huge
> amount of disk space.
>
> -Bruce
>
> On Oct 30, 2016, at 5:21 PM, Brian Kelley  wrote:
>
> Rdkit already has a way to serialize conformers, the binary pickle format!
>
> Perhaps we should make a file extension for multiple molecules.  Say
> ".rdk" and call it a day.   Like inchi the source code is the reference  :)
>
> 
> Brian Kelley
>
> On Oct 27, 2016, at 2:05 AM, Greg Landrum  wrote:
>
> The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys
> format.
> It's easy to imagine something better, but this is at least already there
> and there could be other software that speaks it:
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_
> data/cmpd2.tpl
>
> I'd still like to do a decent JSON format and adding multi-confs to that
> would be logical
>
> On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove <
> davidacosgrov...@gmail.com> wrote:
>
>> I've been wondering if, now that you can get decent conformations from
>> RDKit, it would be worth devising a multi-conformation file format to make
>> reading multi-conf molecules faster for vs purposes. In my experience,
>> pulling all the conformers out of an ascii file such as an sdf can become
>> the RDS for pharmacophore searchimg. Something to think about at the
>> hackathon maybe and certainly something that deserves a new email
>> thread.
>>
>> Dave
>>
>>
>> On Thursday, 27 October 2016, Greg Landrum 
>> wrote:
>>
>>> Hi Thomas,
>>>
>>> You're right, reading multiple conformations out of an SDF does seem
>>> like one of those common operations. Unfortunately the RDKit does not
>>> currently support it in an easy way.
>>>
>>> A python implementation of this would be a good topic for Friday's UGM
>>> hackathon, we can see if anyone finds it interesting enough to work on.
>>>
>>> -greg
>>>
>>>
>>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
>>> wrote:
>>>
 Hello everyone,

 I am a new user of RDkit and I was looking in the documentation for an
 easy way to load multiple conformers from a structure file like .sdf. The
 code must 1) distinguish between different protonation states of the same
 molecule,  2) create a new Mol() object for each protonation state and load
 into it the respective conformers.

 Apparently I can work out a solution for 1)
 using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other
 properties, but I was wondering if there is any more straight forward way
 to do it.
 For 2) I guess I must iterate over all molecules in the input file,
 create new Mol() objects (one for each protonation state of each ligand)
 and add conformers to these new Mol() objects. Again this sounds easily
 programmable, but sounds like a very common operation, thus I was wondering
 if it has been implemented in a function.

 thanks in advance
 Thomas


 --

 ==

 Thomas Evangelidis

 Research Specialist
 CEITEC - Central European Institute of Technology
 Masaryk University
 Kamenice 5/A35/1S081,
 62500 Brno, Czech Republic

 email: tev...@pharm.uoa.gr

   teva...@gmail.com


 website: https://sites.google.com/site/thomasevangelidishomepage/


 
 --
 The Command Line: Reinvented for Modern Developers
 Did the resurgence of CLI tooling catch you by surprise?
 Reconnect with the command line and become more productive.
 Learn the new .NET and ASP.NET CLI. Get your free copy!
 http://sdm.link/telerik
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


>>>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> 

Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-30 Thread Brian Kelley
Rdkit already has a way to serialize conformers, the binary pickle format!

Perhaps we should make a file extension for multiple molecules.  Say ".rdk" and 
call it a day.   Like inchi the source code is the reference  :) 


Brian Kelley

> On Oct 27, 2016, at 2:05 AM, Greg Landrum  wrote:
> 
> The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys format.
> It's easy to imagine something better, but this is at least already there and 
> there could be other software that speaks it:
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_data/cmpd2.tpl
> 
> I'd still like to do a decent JSON format and adding multi-confs to that 
> would be logical
> 
>> On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove  
>> wrote:
>> I've been wondering if, now that you can get decent conformations from 
>> RDKit, it would be worth devising a multi-conformation file format to make 
>> reading multi-conf molecules faster for vs purposes. In my experience, 
>> pulling all the conformers out of an ascii file such as an sdf can become 
>> the RDS for pharmacophore searchimg. Something to think about at the 
>> hackathon maybe and certainly something that deserves a new email thread. 
>> 
>> Dave
>> 
>> 
>>> On Thursday, 27 October 2016, Greg Landrum  wrote:
>>> Hi Thomas,
>>> 
>>> You're right, reading multiple conformations out of an SDF does seem like 
>>> one of those common operations. Unfortunately the RDKit does not currently 
>>> support it in an easy way.
>>> 
>>> A python implementation of this would be a good topic for Friday's UGM 
>>> hackathon, we can see if anyone finds it interesting enough to work on.
>>> 
>>> -greg
>>> 
>>> 
 On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis  
 wrote:
 Hello everyone,
 
 I am a new user of RDkit and I was looking in the documentation for an 
 easy way to load multiple conformers from a structure file like .sdf. The 
 code must 1) distinguish between different protonation states of the same 
 molecule,  2) create a new Mol() object for each protonation state and 
 load into it the respective conformers. 
 
 Apparently I can work out a solution for 1) using mol.GetProp('_Name'), 
 mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was wondering 
 if there is any more straight forward way to do it. 
 For 2) I guess I must iterate over all molecules in the input file, create 
 new Mol() objects (one for each protonation state of each ligand) and add 
 conformers to these new Mol() objects. Again this sounds easily 
 programmable, but sounds like a very common operation, thus I was 
 wondering if it has been implemented in a function.
 
 thanks in advance
 Thomas
 
 
 -- 
 ==
 Thomas Evangelidis
 Research Specialist
 CEITEC - Central European Institute of Technology
 Masaryk University
 Kamenice 5/A35/1S081, 
 62500 Brno, Czech Republic 
 
 email: tev...@pharm.uoa.gr
teva...@gmail.com
 
 website: https://sites.google.com/site/thomasevangelidishomepage/
 
 
 --
 The Command Line: Reinvented for Modern Developers
 Did the resurgence of CLI tooling catch you by surprise?
 Reconnect with the command line and become more productive.
 Learn the new .NET and ASP.NET CLI. Get your free copy!
 http://sdm.link/telerik
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
>>> 
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive. 
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-27 Thread Peter S. Shenkin
It would seem that a major issue with RDKit's multiconformer file is the
inability to associate structure-level and atom-level properties with
conformations. t's not quite orthogonal to the question of how to read,
say, a multiconformer SD file into RDKit's multiconformer format, because
the conformers in said SD file could contain such properties, and
information would be lost.

-P.

On Thu, Oct 27, 2016 at 6:20 AM, Thomas Evangelidis 
wrote:

> Hello Greg,
>
> Is the canonical SMILES string always unique for every isomer and
> tautomerization state of a molecule? If yes, then I have already written a
> function to load multiple molecules and their conformers, which I can share
> it here.
>
> best
> Thomas
>
> PS: thanks to David for pointing this out.
>
>
>
> On 27 October 2016 at 05:20, Greg Landrum  wrote:
>
>> Hi Thomas,
>>
>> You're right, reading multiple conformations out of an SDF does seem like
>> one of those common operations. Unfortunately the RDKit does not currently
>> support it in an easy way.
>>
>> A python implementation of this would be a good topic for Friday's UGM
>> hackathon, we can see if anyone finds it interesting enough to work on.
>>
>> -greg
>>
>>
>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I am a new user of RDkit and I was looking in the documentation for an
>>> easy way to load multiple conformers from a structure file like .sdf. The
>>> code must 1) distinguish between different protonation states of the same
>>> molecule,  2) create a new Mol() object for each protonation state and load
>>> into it the respective conformers.
>>>
>>> Apparently I can work out a solution for 1)
>>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other
>>> properties, but I was wondering if there is any more straight forward way
>>> to do it.
>>> For 2) I guess I must iterate over all molecules in the input file,
>>> create new Mol() objects (one for each protonation state of each ligand)
>>> and add conformers to these new Mol() objects. Again this sounds easily
>>> programmable, but sounds like a very common operation, thus I was wondering
>>> if it has been implemented in a function.
>>>
>>> thanks in advance
>>> Thomas
>>>
>>>
>>> --
>>>
>>> ==
>>>
>>> Thomas Evangelidis
>>>
>>> Research Specialist
>>> CEITEC - Central European Institute of Technology
>>> Masaryk University
>>> Kamenice 5/A35/1S081,
>>> 62500 Brno, Czech Republic
>>>
>>> email: tev...@pharm.uoa.gr
>>>
>>>   teva...@gmail.com
>>>
>>>
>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>> 
>>> --
>>> The Command Line: Reinvented for Modern Developers
>>> Did the resurgence of CLI tooling catch you by surprise?
>>> Reconnect with the command line and become more productive.
>>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>>> http://sdm.link/telerik
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-27 Thread Thomas Evangelidis
Hello Greg,

Is the canonical SMILES string always unique for every isomer and
tautomerization state of a molecule? If yes, then I have already written a
function to load multiple molecules and their conformers, which I can share
it here.

best
Thomas

PS: thanks to David for pointing this out.



On 27 October 2016 at 05:20, Greg Landrum  wrote:

> Hi Thomas,
>
> You're right, reading multiple conformations out of an SDF does seem like
> one of those common operations. Unfortunately the RDKit does not currently
> support it in an easy way.
>
> A python implementation of this would be a good topic for Friday's UGM
> hackathon, we can see if anyone finds it interesting enough to work on.
>
> -greg
>
>
> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
> wrote:
>
>> Hello everyone,
>>
>> I am a new user of RDkit and I was looking in the documentation for an
>> easy way to load multiple conformers from a structure file like .sdf. The
>> code must 1) distinguish between different protonation states of the same
>> molecule,  2) create a new Mol() object for each protonation state and load
>> into it the respective conformers.
>>
>> Apparently I can work out a solution for 1) using mol.GetProp('_Name'), 
>> mol.GetNumAtoms, mol.GetNumBonds
>> and other properties, but I was wondering if there is any more straight
>> forward way to do it.
>> For 2) I guess I must iterate over all molecules in the input file,
>> create new Mol() objects (one for each protonation state of each ligand)
>> and add conformers to these new Mol() objects. Again this sounds easily
>> programmable, but sounds like a very common operation, thus I was wondering
>> if it has been implemented in a function.
>>
>> thanks in advance
>> Thomas
>>
>>
>> --
>>
>> ==
>>
>> Thomas Evangelidis
>>
>> Research Specialist
>> CEITEC - Central European Institute of Technology
>> Masaryk University
>> Kamenice 5/A35/1S081,
>> 62500 Brno, Czech Republic
>>
>> email: tev...@pharm.uoa.gr
>>
>>   teva...@gmail.com
>>
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>> 
>> --
>> The Command Line: Reinvented for Modern Developers
>> Did the resurgence of CLI tooling catch you by surprise?
>> Reconnect with the command line and become more productive.
>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>> http://sdm.link/telerik
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>


-- 

==

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

  teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-27 Thread Greg Landrum
The RDKit has support for the TPL format, an old BioCad/MSI/Accelrys format.
It's easy to imagine something better, but this is at least already there
and there could be other software that speaks it:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/test_data/cmpd2.tpl

I'd still like to do a decent JSON format and adding multi-confs to that
would be logical

On Thu, Oct 27, 2016 at 6:58 AM, David Cosgrove 
wrote:

> I've been wondering if, now that you can get decent conformations from
> RDKit, it would be worth devising a multi-conformation file format to make
> reading multi-conf molecules faster for vs purposes. In my experience,
> pulling all the conformers out of an ascii file such as an sdf can become
> the RDS for pharmacophore searchimg. Something to think about at the
> hackathon maybe and certainly something that deserves a new email thread.
>
> Dave
>
>
> On Thursday, 27 October 2016, Greg Landrum  wrote:
>
>> Hi Thomas,
>>
>> You're right, reading multiple conformations out of an SDF does seem like
>> one of those common operations. Unfortunately the RDKit does not currently
>> support it in an easy way.
>>
>> A python implementation of this would be a good topic for Friday's UGM
>> hackathon, we can see if anyone finds it interesting enough to work on.
>>
>> -greg
>>
>>
>> On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I am a new user of RDkit and I was looking in the documentation for an
>>> easy way to load multiple conformers from a structure file like .sdf. The
>>> code must 1) distinguish between different protonation states of the same
>>> molecule,  2) create a new Mol() object for each protonation state and load
>>> into it the respective conformers.
>>>
>>> Apparently I can work out a solution for 1)
>>> using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other
>>> properties, but I was wondering if there is any more straight forward way
>>> to do it.
>>> For 2) I guess I must iterate over all molecules in the input file,
>>> create new Mol() objects (one for each protonation state of each ligand)
>>> and add conformers to these new Mol() objects. Again this sounds easily
>>> programmable, but sounds like a very common operation, thus I was wondering
>>> if it has been implemented in a function.
>>>
>>> thanks in advance
>>> Thomas
>>>
>>>
>>> --
>>>
>>> ==
>>>
>>> Thomas Evangelidis
>>>
>>> Research Specialist
>>> CEITEC - Central European Institute of Technology
>>> Masaryk University
>>> Kamenice 5/A35/1S081,
>>> 62500 Brno, Czech Republic
>>>
>>> email: tev...@pharm.uoa.gr
>>>
>>>   teva...@gmail.com
>>>
>>>
>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>> 
>>> --
>>> The Command Line: Reinvented for Modern Developers
>>> Did the resurgence of CLI tooling catch you by surprise?
>>> Reconnect with the command line and become more productive.
>>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>>> http://sdm.link/telerik
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reading multiple conformers from file

2016-10-26 Thread Greg Landrum
Hi Thomas,

You're right, reading multiple conformations out of an SDF does seem like
one of those common operations. Unfortunately the RDKit does not currently
support it in an easy way.

A python implementation of this would be a good topic for Friday's UGM
hackathon, we can see if anyone finds it interesting enough to work on.

-greg


On Tue, Oct 25, 2016 at 2:16 AM, Thomas Evangelidis 
wrote:

> Hello everyone,
>
> I am a new user of RDkit and I was looking in the documentation for an
> easy way to load multiple conformers from a structure file like .sdf. The
> code must 1) distinguish between different protonation states of the same
> molecule,  2) create a new Mol() object for each protonation state and load
> into it the respective conformers.
>
> Apparently I can work out a solution for 1) using mol.GetProp('_Name'),
> mol.GetNumAtoms, mol.GetNumBonds and other properties, but I was
> wondering if there is any more straight forward way to do it.
> For 2) I guess I must iterate over all molecules in the input file, create
> new Mol() objects (one for each protonation state of each ligand) and add
> conformers to these new Mol() objects. Again this sounds easily
> programmable, but sounds like a very common operation, thus I was wondering
> if it has been implemented in a function.
>
> thanks in advance
> Thomas
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] reading multiple conformers from file

2016-10-24 Thread Thomas Evangelidis
Hello everyone,

I am a new user of RDkit and I was looking in the documentation for an easy
way to load multiple conformers from a structure file like .sdf. The code
must 1) distinguish between different protonation states of the same
molecule,  2) create a new Mol() object for each protonation state and load
into it the respective conformers.

Apparently I can work out a solution for 1)
using mol.GetProp('_Name'), mol.GetNumAtoms, mol.GetNumBonds and other
properties, but I was wondering if there is any more straight forward way
to do it.
For 2) I guess I must iterate over all molecules in the input file, create
new Mol() objects (one for each protonation state of each ligand) and add
conformers to these new Mol() objects. Again this sounds easily
programmable, but sounds like a very common operation, thus I was wondering
if it has been implemented in a function.

thanks in advance
Thomas


-- 

==

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

  teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss