On 15 August 2017 at 12:05, Greg Landrum <greg.land...@gmail.com> wrote:

> Nope, just the discussion that you see there.
> Doing the work stalled on people (well, at least me) not having time to
> actually do it. :-(
>
> I still think it would be interesting, but it seems unlikely that it will
> get done unless someone else can make the time.
>
> -greg
>
>
>

Alright, then I post my - perhaps non-perfect - function to load a
multi-conformer sdf file.



import collections, re
def tree(): # function to create multidimensional dictionaries
    return collections.defaultdict(tree)



def load_multiconf_sdf(sdf_file, get_molnames=False, keep_iso=True):
    """
        FUNCTION that distinguishes the different isomers of each compound
in an .sdf file, and loads them along with their conformers
        into separate Chem.rdchem.Mol() objects (all the conformers are in
the same mol). As of October 2016, RDKit did not support multi-conformer
file reader, therefore I store each
        conformer loaded from the sdf file as a separate mol. Then I check
if the molname and SMILES string already exist in the multidict
        and if yes, I add a new conformer to the existing molecule. SMILES
strings are used to distinguish the various tautomerization/ionization
states.

    ARGS:
    sdf_file:   multi-conformer sdf file. The different
tautomers/ionization states of each molecule must have the suffix
"_iso[0-9]" in their property "_Name".
    get_molnames:    return a list of the molecule names in the sdf file
        keep_iso:   keep the "iso_[0-9]" suffix in the molname. Use with
caution because if False only one of the tautomers/ionization states will
be saved.
    RETURNS:
    molname_SMILES_conformersMol_multidict:    multi-dimensional dictionary
storing the following information: molname->SMILES string->mol object with
multiples conformers
    molnames_list:    if get_molnames=True, it also returns a list of the
molnames of the molecules in the sdf file.
    """
    print "Loading multi-conformer .sdf file " + sdf_file
    molname_SMILES_conformersMol_multidict = tree()    #
molname->SMILES->Chem.rdchem.Mol() object containing all the conformers of
this compound
    suppl = Chem.SDMolSupplier(sdf_file, removeHs = False)
    molnames_list = []
    for mol in suppl:
        if mol == None or mol.GetNumAtoms() == 0:
            continue # skip empty molecules
        molname = mol.GetProp('_Name').lower()
        print "reading ", molname, "from file", sdf_file
        if keep_iso == False:
            molname = re.sub(r"_iso[0-9]+", "", molname)
        props = [p for p in mol.GetPropNames(True, True)]
        if 'SMILES' in props:   # distinguish the isomers and protonation
states by the canonical SMILES string
            SMILES = mol.GetProp('SMILES')  # syntax correct?????
        else:   # if not present in the inpust structure files, compute it
            SMILES = Chem.MolToSmiles(mol, isomericSmiles=True,
canonical=True, allBondsExplicit=True)
            mol.SetProp('SMILES', SMILES)
        try:

molname_SMILES_conformersMol_multidict[molname][SMILES].AddConformer(mol.GetConformer())
        except (AttributeError, KeyError):
            molname_SMILES_conformersMol_multidict[molname][SMILES] = mol
            #
molname_SMILES_conformersMol_multidict[molname][SMILES].AddConformer(mol.GetConformer())
# this add a replicate of mol (WRONG!!!??)
        molnames_list.append(molname)

    if get_molnames:
        return molname_SMILES_conformersMol_multidict, molnames_list
    else:

>         return molname_SMILES_conformersMol_multidict









-- 

======================================================================

Dr Thomas Evangelidis

Post-doctoral Researcher
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/2S049,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

          teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to