Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5

2018-06-13 Thread Wandré
Great!

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG


Em qua, 13 de jun de 2018 às 13:45, Geoffrey Hutchison <
geoff.hutchi...@gmail.com> escreveu:

> >> Note that my answer assumes that there is a reason that you don't have
> X11 installed on your linux box. If that's not the case, you should be able
> to fix things "more easily" by installing X
> >
> > Quite frankly, this is rapidly becoming unusable as a software platform.
> I need to install X11 to UUF-optimize a MOL? Seriously?
>
> No, you can compile RDKit yourself if you don't want to use X11 features.
> You wanted to install through conda, which has a set of packages for 'most
> use' - YMMV.
> (We have a version of RDKit on our server w/o X11)
>
> My $0.02,
> -Geoff
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5

2018-06-13 Thread Wandré
Just run sudo apt-get install libxrender1 and it works
Chris Earnshaw has send me a email with this tip.

Thanks!!!
--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG


Em qua, 13 de jun de 2018 às 12:51, Dimitri Maziuk via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> escreveu:

> On 6/13/2018 10:06 AM, Greg Landrum wrote:
> > Note that my answer assumes that there is a reason that you don't have
> > X11 installed on your linux box. If that's not the case, you should be
> > able to fix things "more easily" by installing X
>
> Quite frankly, this is rapidly becoming unusable as a software platform.
> I need to install X11 to UUF-optimize a MOL? Seriously?
>
> E.g. on centos anaconda installs NetworkManager (why?) which comes in
> "enabled at boot" but not configured, so next time you reboot, perhaps
> weeks later, tada! -- you've lost the network. And don't get me started
> on having several versions of boost coexist...
>
> Dima
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5

2018-06-13 Thread Wandré
Hi all!

I install the Anaconda 4.5.4 with Python 3.6.5 and install rdkit (with
command "conda install -c rdkit rdkit") and I'm trying to import the Chem
and does not works.

from rdkit import Chem

Traceback (most recent call last):

  File "", line 1, in 

  File
"/home/wandre/anaconda3/envs/flaskapp/lib/python3.6/site-packages/rdkit/Chem/__init__.py",
line 25, in 

from rdkit.Chem.rdmolops import *

ImportError: libXrender.so.1: cannot open shared object file: No such file
or directory

How can I fix this? Where is my error?

Thanks!
--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Errors with RDKit

2018-01-23 Thread Wandré
Hi Carlos,
Simmilar to Axel, in my code I use
if mol is None: return False (if you are using a function to read each SDF
file)
if mol is None: continue (to force the next loop)

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2018-01-23 0:59 GMT-02:00 Carlos Faerman <i...@chemvisen.com>:

> Hello,
>
> My code is very simple:
>
> suppl4 = Chem.SDMolSupplier("/Volumes/MyPassportForMac/chembl_23.sdf")
>
> i = 0
>
> for mol in suppl4:
>
>  smile = Chem.MolToSmiles(mol,isomericSmiles=True)
>
>  fingerpri = get_fp_rdkit(mol)
>
>  namenmol = mol.GetProp("_Name")
>
>  patron[namenmol]=fingerpri
>
>  print (i,namenmol,smile)
>
>  outfile.write('{} {} {}\n'.format(namenmol,smile,fingerpri))
>
>  i = i+1
> I know that a few molecules in Chembl_23.sdf have wrong valences.
>
> Is there a way to skip these molecules when these errors are found?
>
>
> 671231 CHEMBL1254908 O=C(NCCN1CCC2(CC1)C(=O)NCN2c1(Cl)c1)c1cc2cc(F)
> ccc2[nH]1
>
> [21:18:16] Explicit valence for atom # 35 N, 5, is greater than permitted
>
> [21:18:16] ERROR: Could not sanitize molecule ending on line 48940986
>
> [21:18:16] ERROR: Explicit valence for atom # 35 N, 5, is greater than
> permitted
>
> Traceback (most recent call last):
>
>   File "calculate-fingerprints.py", line 23, in 
>
> smile = Chem.MolToSmiles(mol,isomericSmiles=True)
>
> Boost.Python.ArgumentError: Python argument types in
>
> rdkit.Chem.rdmolfiles.MolToSmiles(NoneType)
>
> did not match C++ signature:
>
> MolToSmiles(RDKit::ROMol mol, bool isomericSmiles=False, bool
> kekuleSmiles=False, int rootedAtAtom=-1, bool canonical=True, bool
> allBondsExplicit=False, bool allHsExplicit=False)
>  The "culprit" molecule seems to be
>
> CHEMBL450200
>
> -My question to this forum:
>
> Please suggest a way to modify the code to skip the wrong molecule(s)
> instead of abruptly ending the run
>
> Thank you,
>
> Carlos Faerman
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Wandré
Thanks Andrew, I will try this steps.
So, to avoid recalculate fingerprints, how can I calculate them and store
in database?
When I calculate AtomPair fingerprint, returns
a rdkit.DataStructs.cDataStructs.IntSparseIntVect object
How to store this rdkit Python object in a database and how to read them
again?

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2018-01-11 12:46 GMT-02:00 Andrew Dalke <da...@dalkescientific.com>:

> On Jan 11, 2018, at 12:04, Wandré <wandrevel...@gmail.com> wrote:
> > Thanks for the link. It is very interesting. I will read very carefully.
> > So, as input on ChemFP, I have to put a file with all molecules in 1 SDF?
>
> Chemfp works with fingerprint files, in your case, chemfp's text-based
> "FPS" format. You'll need to use 'rdkit2fps' to convert your InChI
> structures into a fingerprint.
>
> Here's an example file, where I follow the Open Babel convention of
> allowing an identifier after the InChI string:
>
> % cat examples.inchi
> InChI=1S/C6H6O/c7-6-4-2-1-3-5-6/h1-5,7H phenol
> InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H benzene
> InChI=1S/CH4/h1H4/i1D4 deuterated methane
>
> You could also use an SDF or SMILES file.
>
> Next, I generate AtomPair fingerprints. The output goes to "examples.fps",
> which I'll then display.
>
> % rdkit2fps --pairs examples.inchi -o examples.fps
> % cat examples.fps
> #FPS1
> #num_bits=2048
> #type=RDKit-AtomPair/2 fpSize=2048 minLength=1 maxLength=30
> #software=RDKit/2016.09.3 chemfp/3.1
> #source=examples.inchi
> #date=2018-01-11T14:38:57
> 
> 
> 
> 1100
> 00310300
> 00700303
> 00073000
> 
> phenol
> 
> 
> 
> 
> 0030
> 0070
> 0007
> 
> benzene
> 
> 
> 
> 7000
> 
> 0070
> 
> 
> deuterated methane
>
>
> Finally, I run the clustering program, with a low threshold so it does
> something other than the trivial output of three clusters.
>
> % python taylor_butina.py -t 0.3 examples.fps
> 0 true singletons
> =>
>
> 1 false singletons
> => deuterated methane
>
> 1 clusters
> phenol has 1 other members
> => benzene
>
> This output format is rather ad hoc. I need to figure out what format
> people want from a clustering tool; preferably one that other tools can
> import without further conversion.
>
> I'll be glad to hear any suggestions.
>
> Cheers,
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Wandré
Hi Andrew,

Thanks for the link. It is very interesting. I will read very carefully.
So, as input on ChemFP, I have to put a file with all molecules in 1 SDF?

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2018-01-11 6:59 GMT-02:00 Andrew Dalke <da...@dalkescientific.com>:

> Hi Wandré,
>
>   You may want to look at chemfp for this sort of clustering.
>
> Last year Chris Swain reviewed a few different ways to do clustering, at
> https://www.macinchem.org/reviews/clustering/clustering.php . His data
> set had 4.4M fingerprints and it took 10 hours to cluster at 0.8 similarity
> threshold.
>
> Chemfp doesn't include the Taylor-Butina algorithm as part of the
> distribution. That will likely be included in the next release.
>
> Instead, I worked with Chris to develop a version he could use for
> testing. It looks like the copy from his web page is not available (the
> download URL redirects to itself, producing an infinite loop).
>
> I have put a copy at http://dalkescientific.com/writings/taylor_butina.py
> , if you want to try it out.
>
> Best regards,
>
> Andrew
>     da...@dalkescientific.com
>
>
> > On Jan 11, 2018, at 09:27, Wandré <wandrevel...@gmail.com> wrote:
> >
> > Hi,
> > (first of all, sorry by my poor english...)
> > I'm trying to clustering a large dataset of molecules, but, in a server
> with 64GB of RAM and 32 cores, all RAM memory and cache are occuped and,
> after 10 hours, the clustering is not calculated yet.
> > My set of molecules have more than 1 million of hits, I'm using the
> atompair fingerprint and clusterFPS Butina algorithm to clustering.
> > What can I do?
> > I thought about calculating all the fingerprints, store them in my
> relational PostgreSQL database (not cartridge), store the result of
> BulkTanimotoSimilarity (distance matrix, all against all) to use less RAM
> and allow to run new clustering in minor time (I spend 30 minutes just to
> calculate all fingerprints).
> > How to store this values (fingerprint and BulkTanimotoSimilarity)?
> > Here is a part of my code:
> >
> > for i in range(0, len(tb_hit_data)):
> > try:
> > #This step I want to save to use less CPU time (just run once)
> > mol = Chem.MolFromInchi(tb_hit_data[i][1])
> > fps.append(Pairs.GetAtomPairFingerprint(mol))
> > ids.append(tb_hit_data[i][0])
> > except:
> > print "in mol", tb_hit_data[i][0], "AtomPair cannot be generated"
> > clusters = self.clusterfps(fps, cutoff_value)
> >
> >
> > def clusterfps(cls, fps, cutoff=0.99):
> > """Method that clustering all data, passed in fps, with an specific
> cutoff
> > """
> > from rdkit.ML.Cluster import Butina
> >
> > # first generate the distance matrix:
> > dists = []
> > nfps = len(fps)
> > for i in range(1, nfps):
> > #This is other step that I want to store in database (just run
> once)
> > sims = DataStructs.BulkTanimotoSimilarity(fps[i], fps[:i])
> > dists.extend([1 - x for x in sims])
> >
> > # now cluster the data:
> > cluster_data = Butina.ClusterData(dists, nfps, cutoff,
> isDistData=True)
> > return cluster_data
> > # End def clusterfps
> >
> > Thanks!
> > --
> > Wandré Nunes de Pinho Veloso
> > Professor Assistente - Unifei - Campus Avançado de Itabira-MG
> > Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
> UFMG
> > Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
> Inteligência Computacional - UNIFEI
> > Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
> > Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
> > Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Use fingerprint do Clustering a large dataset of molecules

2018-01-11 Thread Wandré
Hi,
(first of all, sorry by my poor english...)
I'm trying to clustering a large dataset of molecules, but, in a server
with 64GB of RAM and 32 cores, all RAM memory and cache are occuped and,
after 10 hours, the clustering is not calculated yet.
My set of molecules have more than 1 million of hits, I'm using the
atompair fingerprint and clusterFPS Butina algorithm to clustering.
What can I do?
I thought about calculating all the fingerprints, store them in my
relational PostgreSQL database (not cartridge), store the result
of BulkTanimotoSimilarity (distance matrix, all against all) to use less
RAM and allow to run new clustering in minor time (I spend 30 minutes just
to calculate all fingerprints).
How to store this values (fingerprint and BulkTanimotoSimilarity)?
Here is a part of my code:

for i in range(0, len(tb_hit_data)):
try:
*#This step I want to save to use less CPU time (just run once)*
mol = Chem.MolFromInchi(tb_hit_data[i][1])
fps.append(Pairs.GetAtomPairFingerprint(mol))
ids.append(tb_hit_data[i][0])
except:
print "in mol", tb_hit_data[i][0], "AtomPair cannot be generated"
clusters = self.clusterfps(fps, cutoff_value)


def clusterfps(cls, fps, cutoff=0.99):
"""Method that clustering all data, passed in fps, with an specific cutoff
"""
from rdkit.ML.Cluster import Butina

# first generate the distance matrix:
dists = []
nfps = len(fps)
for i in range(1, nfps):
*#This is other step that I want to store in database (just run once)*
sims = DataStructs.BulkTanimotoSimilarity(fps[i], fps[:i])
dists.extend([1 - x for x in sims])

# now cluster the data:
cluster_data = Butina.ClusterData(dists, nfps, cutoff, isDistData=True)
return cluster_data
# End def clusterfps

Thanks!
--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-14 Thread Wandré
I have try this command several times and doesn't fixed. Maybe because all
the others commands, this works now.
Thanks for the help

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-14 19:03 GMT-03:00 Markus Sitzmann <markus.sitzm...@gmail.com>:

> Hmm, your last command "conda install -c rdkit rdkit" should have been
> sufficient for the installation (after installing anaconda)
>
> On Thu, Sep 14, 2017 at 11:57 PM, Malitha Kabir <malitha12...@gmail.com>
> wrote:
>
>> That's great to hear you got things running. That was basically due to
>> the path update. Have a great day! - malitha.
>>
>> On Sep 15, 2017 3:44 AM, "Wandré" <wandrevel...@gmail.com> wrote:
>>
>>> I don't know what I do, but now everything *is working fine.*
>>> My last commands:
>>>
>>> Reinstall Anaconda2
>>>
>>> Trying to compile the RDKit
>>>
>>>- sudo tar xzvf RDKit_2016_03_1.tgz -C ~/anaconda2/
>>>- vim ~/.bashrc (update the variables with the new path of rdkit)
>>>- . ~/.bashrc
>>>- cd ~/anaconda2/rdkit
>>>- mkdir build
>>>- cd build
>>>- cmake -DRDK_BUILD_INCHI_SUPPORT=ON ..
>>>- make -j 4 (ERROR on 50%)
>>>- make install (ERROR on 24%)
>>>
>>>
>>> sudo apt-get install python-rdkit librdkit1 rdkit-data
>>> sudo apt-get update
>>> conda install -c rdkit rdkit
>>>
>>>
>>> --
>>> Wandré Nunes de Pinho Veloso
>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>> UFMG
>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>> Inteligência Computacional - UNIFEI
>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>>
>>> 2017-09-14 18:28 GMT-03:00 Wandré <wandrevel...@gmail.com>:
>>>
>>>> I really can't do this right...
>>>> I am thinking in reinstall Ubuntu and try again. Now the RDKit doesn't
>>>> works.
>>>>
>>>>
>>>> --
>>>> Wandré Nunes de Pinho Veloso
>>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>>> UFMG
>>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>>> Inteligência Computacional - UNIFEI
>>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>>>
>>>> 2017-09-14 15:07 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:
>>>>
>>>>> Hi Wandré,
>>>>>
>>>>> Sorry to see you in trouble again.
>>>>>
>>>>> If you see the massages then
>>>>> # All requested packages already installed.
>>>>> # packages in environment at /home/wandre/anaconda2:
>>>>> #
>>>>> rdkit 2017.03.3   np111py27_1rdkit
>>>>>
>>>>> so your rdkit should be at
>>>>> /home/wandre/anaconda2/
>>>>>
>>>>> But it is trying to import from
>>>>> /opt/rdkit-Release_2016_03_1/rdkit/
>>>>>
>>>>> So the path variable is still NOT correct. It is good to see that you
>>>>> installed conda correctly and rdkit i think installed correctly.
>>>>>
>>>>> Therefore I suspect that you need to remove the previously path for
>>>>> rdkit (you probably set those during installing from source).
>>>>>
>>>>> I forgot what linux command will do that for you. Have a great day!
>>>>>
>>>>> *** I will update the answer whenever I get the appropriate linux
>>>>> command.
>>>>>
>>>>> - malitha
&

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-14 Thread Wandré
I don't know what I do, but now everything *is working fine.*
My last commands:

Reinstall Anaconda2

Trying to compile the RDKit

   - sudo tar xzvf RDKit_2016_03_1.tgz -C ~/anaconda2/
   - vim ~/.bashrc (update the variables with the new path of rdkit)
   - . ~/.bashrc
   - cd ~/anaconda2/rdkit
   - mkdir build
   - cd build
   - cmake -DRDK_BUILD_INCHI_SUPPORT=ON ..
   - make -j 4 (ERROR on 50%)
   - make install (ERROR on 24%)


sudo apt-get install python-rdkit librdkit1 rdkit-data
sudo apt-get update
conda install -c rdkit rdkit


--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-14 18:28 GMT-03:00 Wandré <wandrevel...@gmail.com>:

> I really can't do this right...
> I am thinking in reinstall Ubuntu and try again. Now the RDKit doesn't
> works.
>
>
> --
> Wandré Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
> Inteligência Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>
> 2017-09-14 15:07 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:
>
>> Hi Wandré,
>>
>> Sorry to see you in trouble again.
>>
>> If you see the massages then
>> # All requested packages already installed.
>> # packages in environment at /home/wandre/anaconda2:
>> #
>> rdkit 2017.03.3   np111py27_1rdkit
>>
>> so your rdkit should be at
>> /home/wandre/anaconda2/
>>
>> But it is trying to import from
>> /opt/rdkit-Release_2016_03_1/rdkit/
>>
>> So the path variable is still NOT correct. It is good to see that you
>> installed conda correctly and rdkit i think installed correctly.
>>
>> Therefore I suspect that you need to remove the previously path for rdkit
>> (you probably set those during installing from source).
>>
>> I forgot what linux command will do that for you. Have a great day!
>>
>> *** I will update the answer whenever I get the appropriate linux
>> command.
>>
>> - malitha
>>
>>
>>
>> On Thu, Sep 14, 2017 at 11:38 PM, Wandré <wandrevel...@gmail.com> wrote:
>>
>>> Thanks Malitha,
>>>
>>> When I install Anaconda I said yes to all questions.
>>> When I trying to reinstall the RDKit, this message appears:
>>>
>>> wandre@wandreLinux:~/anaconda2$ conda install -c rdkit rdkit
>>> Fetching package metadata ...
>>> Solving package specifications: .
>>>
>>> # All requested packages already installed.
>>> # packages in environment at /home/wandre/anaconda2:
>>> #
>>> rdkit 2017.03.3   np111py27_1rdkit
>>>
>>> When I run "python", appears:
>>>
>>> Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016,
>>> 23:09:15)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> Anaconda is brought to you by Continuum Analytics.
>>> Please check out: http://continuum.io/thanks and https://anaconda.org
>>> >>> import rdkit
>>> >>> from rdkit import Chem
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "/opt/rdkit-Release_2016_03_1/rdkit/Chem/__init__.py", line 18,
>>> in 
>>> from rdkit import rdBase
>>> ImportError: cannot import name rdBase
>>>
>>>
>>> --
>>> Wandré Nunes de Pinho Veloso
>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>> UFMG
>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>> Inteligência Computacional - UNIFEI
>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>> Laboratório

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-14 Thread Wandré
I really can't do this right...
I am thinking in reinstall Ubuntu and try again. Now the RDKit doesn't
works.


--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-14 15:07 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:

> Hi Wandré,
>
> Sorry to see you in trouble again.
>
> If you see the massages then
> # All requested packages already installed.
> # packages in environment at /home/wandre/anaconda2:
> #
> rdkit 2017.03.3   np111py27_1rdkit
>
> so your rdkit should be at
> /home/wandre/anaconda2/
>
> But it is trying to import from
> /opt/rdkit-Release_2016_03_1/rdkit/
>
> So the path variable is still NOT correct. It is good to see that you
> installed conda correctly and rdkit i think installed correctly.
>
> Therefore I suspect that you need to remove the previously path for rdkit
> (you probably set those during installing from source).
>
> I forgot what linux command will do that for you. Have a great day!
>
> *** I will update the answer whenever I get the appropriate linux command.
>
> - malitha
>
>
>
> On Thu, Sep 14, 2017 at 11:38 PM, Wandré <wandrevel...@gmail.com> wrote:
>
>> Thanks Malitha,
>>
>> When I install Anaconda I said yes to all questions.
>> When I trying to reinstall the RDKit, this message appears:
>>
>> wandre@wandreLinux:~/anaconda2$ conda install -c rdkit rdkit
>> Fetching package metadata ...
>> Solving package specifications: .
>>
>> # All requested packages already installed.
>> # packages in environment at /home/wandre/anaconda2:
>> #
>> rdkit 2017.03.3   np111py27_1rdkit
>>
>> When I run "python", appears:
>>
>> Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> Anaconda is brought to you by Continuum Analytics.
>> Please check out: http://continuum.io/thanks and https://anaconda.org
>> >>> import rdkit
>> >>> from rdkit import Chem
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "/opt/rdkit-Release_2016_03_1/rdkit/Chem/__init__.py", line 18,
>> in 
>> from rdkit import rdBase
>> ImportError: cannot import name rdBase
>>
>>
>> --
>> Wandré Nunes de Pinho Veloso
>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>> Inteligência Computacional - UNIFEI
>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>
>> 2017-09-14 9:17 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:
>>
>>> Hi Wandré,
>>>
>>> Good day! It's malitha.
>>>
>>> Considering your first question I would say, the path variable NOT set
>>> correctly. To avoid having gymnastic with linux system you may consider the
>>> following steps:
>>>
>>>1. Install miniconda or andcona from https://conda.io/miniconda.html
>>>and command yes (y) when it says to add path variable to python shipped
>>>with conda. I mean python within conda would be your default python. 
>>> After
>>>installing it, when you run the command <<<<>>>>> from shell you
>>>will see something like <<>> at the screen
>>>2. Install rdkit from https://anaconda.org/rdkit/rdkit on top of
>>>conda
>>>
>>>
>>> For question regarding energy minimization, you may find the following
>>> link helpful.
>>> https://sourceforge.net/p/rdkit/mailman/message/28298074/
>>>
>>> I hope, it helps!
>>>
>>> - malitha
>>>
>>> On Thu, Sep 14, 2017 at 4:22 PM, Wandré <wandrevel...@gmail.com> wrote:
>>>
>>>> So,
>>>> 1) I run all the com

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-14 Thread Wandré
Thanks Malitha,

When I install Anaconda I said yes to all questions.
When I trying to reinstall the RDKit, this message appears:

wandre@wandreLinux:~/anaconda2$ conda install -c rdkit rdkit
Fetching package metadata ...
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /home/wandre/anaconda2:
#
rdkit 2017.03.3   np111py27_1rdkit

When I run "python", appears:

Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import rdkit
>>> from rdkit import Chem
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/rdkit-Release_2016_03_1/rdkit/Chem/__init__.py", line 18, in

from rdkit import rdBase
ImportError: cannot import name rdBase


--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-14 9:17 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:

> Hi Wandré,
>
> Good day! It's malitha.
>
> Considering your first question I would say, the path variable NOT set
> correctly. To avoid having gymnastic with linux system you may consider the
> following steps:
>
>1. Install miniconda or andcona from https://conda.io/miniconda.html
>and command yes (y) when it says to add path variable to python shipped
>with conda. I mean python within conda would be your default python. After
>installing it, when you run the command <<<<>>>>> from shell you
>will see something like <<>> at the screen
>2. Install rdkit from https://anaconda.org/rdkit/rdkit on top of conda
>
>
> For question regarding energy minimization, you may find the following
> link helpful.
> https://sourceforge.net/p/rdkit/mailman/message/28298074/
>
> I hope, it helps!
>
> - malitha
>
> On Thu, Sep 14, 2017 at 4:22 PM, Wandré <wandrevel...@gmail.com> wrote:
>
>> So,
>> 1) I run all the commands in tutorial of installation of RDKit in Conda (
>> https://github.com/rdkit/conda-rdkit), but, when I run python and try to
>> import Chem ("from rdkit import Chem") appears an error message:
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "/opt/rdkit-Release_2016_03_1/rdkit/Chem/__init__.py", line 18,
>> in 
>> from rdkit import rdBase
>> ImportError: cannot import name rdBase
>>
>> 2) Thanks for all the references
>>
>> 3) Which function generate this "energy minimized molecule"?
>>
>> --
>> Wandré Nunes de Pinho Veloso
>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>> Inteligência Computacional - UNIFEI
>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>
>> 2017-09-13 17:32 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:
>>
>>> Hi Wandré,
>>>
>>> 1) apt-get installs rdkit 2013 (link below). So, please install it
>>> through conda (as Markus suggested)
>>> https://packages.ubuntu.com/trusty/python/python-rdkit
>>>
>>> 2) I am not familiar with the case of wrong SMILE generation. But the
>>> link below says something more that I think you need to know.
>>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495655/
>>>
>>> 3) As you are trying to store data, it would be great to consider
>>> whether you are storing energy minimized molecule or not. (my opinion).
>>> Surface area related descriptors will yield different result and bond
>>> connectivity related descriptor will yield same result in both cases.
>>>
>>> 4) Sharing my personal experience, during my undergraduate school part
>>> of my final year project was stressed up with conceptual questions. I
>&

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-14 Thread Wandré
So,
1) I run all the commands in tutorial of installation of RDKit in Conda (
https://github.com/rdkit/conda-rdkit), but, when I run python and try to
import Chem ("from rdkit import Chem") appears an error message:
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/rdkit-Release_2016_03_1/rdkit/Chem/__init__.py", line 18, in

from rdkit import rdBase
ImportError: cannot import name rdBase

2) Thanks for all the references

3) Which function generate this "energy minimized molecule"?

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 17:32 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:

> Hi Wandré,
>
> 1) apt-get installs rdkit 2013 (link below). So, please install it through
> conda (as Markus suggested)
> https://packages.ubuntu.com/trusty/python/python-rdkit
>
> 2) I am not familiar with the case of wrong SMILE generation. But the link
> below says something more that I think you need to know.
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495655/
>
> 3) As you are trying to store data, it would be great to consider whether
> you are storing energy minimized molecule or not. (my opinion). Surface
> area related descriptors will yield different result and bond connectivity
> related descriptor will yield same result in both cases.
>
> 4) Sharing my personal experience, during my undergraduate school part of
> my final year project was stressed up with conceptual questions. I failed
> to utilize the  blessing of advanced development due to the lack of time.
> The later experience was not so good.
>
> Please keep in mind that we can generate a non redundant database with few
> molecules but for millions of molecules it should be quite though task.
> Have a great day!
>
> - malitha
>
>
>
>
> On Thu, Sep 14, 2017 at 2:05 AM, Markus Sitzmann <
> markus.sitzm...@gmail.com> wrote:
>
>> PS. The conda version has InChI support
>>
>> On Wed, Sep 13, 2017 at 10:04 PM, Markus Sitzmann <
>> markus.sitzm...@gmail.com> wrote:
>>
>>> Strong recommendation: use the conda version:
>>>
>>> http://www.rdkit.org/docs/Install.html
>>>
>>> On Wed, Sep 13, 2017 at 9:58 PM, Wandré <wandrevel...@gmail.com> wrote:
>>>
>>>> I just run sudo apt-get install python-rdkit librdkit1 rdkit-data 
>>>> I'm trying to solve this with this link: http://www.blopig.com/bl
>>>> og/2013/02/how-to-install-rdkit-on-ubuntu-12-04/
>>>>
>>>> --
>>>> Wandré Nunes de Pinho Veloso
>>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>>> UFMG
>>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>>> Inteligência Computacional - UNIFEI
>>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>>>
>>>> 2017-09-13 16:55 GMT-03:00 Markus Sitzmann <markus.sitzm...@gmail.com>:
>>>>
>>>>> How did you install rdkit so far? And where? Is it the conda/anaconda
>>>>> version?
>>>>>
>>>>> On Wed, Sep 13, 2017 at 9:39 PM, Wandré <wandrevel...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> How to install RDKit with InChI?
>>>>>> When I run Chem.inchi.INCHI_AVAILABLE, the result is False
>>>>>>
>>>>>> --
>>>>>> Wandré Nunes de Pinho Veloso
>>>>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>>>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>>>>> UFMG
>>>>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>>>>> Inteligência Computacional - UNIFEI
>>>>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>>>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>>>>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>>>>

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Wandré
I just run sudo apt-get install python-rdkit librdkit1 rdkit-data 
I'm trying to solve this with this link:
http://www.blopig.com/blog/2013/02/how-to-install-rdkit-on-ubuntu-12-04/

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 16:55 GMT-03:00 Markus Sitzmann <markus.sitzm...@gmail.com>:

> How did you install rdkit so far? And where? Is it the conda/anaconda
> version?
>
> On Wed, Sep 13, 2017 at 9:39 PM, Wandré <wandrevel...@gmail.com> wrote:
>
>> How to install RDKit with InChI?
>> When I run Chem.inchi.INCHI_AVAILABLE, the result is False
>>
>> --
>> Wandré Nunes de Pinho Veloso
>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>> Inteligência Computacional - UNIFEI
>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>
>> 2017-09-13 16:30 GMT-03:00 Wandré <wandrevel...@gmail.com>:
>>
>>> Thanks Malitha.
>>> I choose this descriptors because I will store this on my database, so,
>>> will be fast compare one molecule before insert them in database.
>>> My worry now is if the RDKit will generate different SMILES or InChI in
>>> same SDF molecule or equals in different molecules (molecules from RCSB
>>> PDB, PubChem, ChemBL, for example).
>>>
>>> --
>>> Wandré Nunes de Pinho Veloso
>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>> UFMG
>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>> Inteligência Computacional - UNIFEI
>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>>
>>> 2017-09-13 16:22 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:
>>>
>>>> Hi Wandré,
>>>>
>>>> It seems you already did intense research on it. Kindly accept my
>>>> comments as an addition to your idea (not the answer you trying to find
>>>> out). In my idea, categorizing molecules using it's descriptor should
>>>> reduce computation time. RDKit currently offer calculation of about 200
>>>> descriptors! So, a careful look up at those makes a lot of sense to me.
>>>> Conceptually, descriptor matching should follow a sequence (I don't know
>>>> what sequence would be ideal) - for example MolWt should match first (H
>>>> contribution and ions should be taken into consideration here) and then
>>>> subsequent matching of other descriptors (might be different while writing
>>>> programs). There are a few reading materials on molecular fingerprint and
>>>> database schema. You may have a look at those.
>>>>
>>>> The links are from Daylight. I am neither involved with the company nor
>>>> their product.
>>>> http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
>>>> http://www.daylight.com/dayhtml/doc/theory/theory.thor.html
>>>>
>>>> Best regards,
>>>> - malitha
>>>>
>>>>
>>>> On Thu, Sep 14, 2017 at 12:43 AM, Wandré <wandrevel...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for all the answers.
>>>>>
>>>>> Reading all answers, I think in something different... If the SMILES
>>>>> (Chem.MolToSmiles(mol,isomericSmiles=True)) and Inchi
>>>>> (Chem.MolToInchi(mol)) can generate the same value in different molecules,
>>>>> I will generate others descriptors (NumHDonors, NumHAcceptors, Ri
>>>>> ngCount, GetNumAtoms, TPSA, pyLabuteASA, MolWt, CalcNumRotatableBonds
>>>>> and MolLogP) to compare all the molecules that SMILES and Inchi are the
>>>>> same.
>>>>> If all this data are the same, I will generat

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Wandré
How to install RDKit with InChI?
When I run Chem.inchi.INCHI_AVAILABLE, the result is False

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 16:30 GMT-03:00 Wandré <wandrevel...@gmail.com>:

> Thanks Malitha.
> I choose this descriptors because I will store this on my database, so,
> will be fast compare one molecule before insert them in database.
> My worry now is if the RDKit will generate different SMILES or InChI in
> same SDF molecule or equals in different molecules (molecules from RCSB
> PDB, PubChem, ChemBL, for example).
>
> --
> Wandré Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
> Inteligência Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>
> 2017-09-13 16:22 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:
>
>> Hi Wandré,
>>
>> It seems you already did intense research on it. Kindly accept my
>> comments as an addition to your idea (not the answer you trying to find
>> out). In my idea, categorizing molecules using it's descriptor should
>> reduce computation time. RDKit currently offer calculation of about 200
>> descriptors! So, a careful look up at those makes a lot of sense to me.
>> Conceptually, descriptor matching should follow a sequence (I don't know
>> what sequence would be ideal) - for example MolWt should match first (H
>> contribution and ions should be taken into consideration here) and then
>> subsequent matching of other descriptors (might be different while writing
>> programs). There are a few reading materials on molecular fingerprint and
>> database schema. You may have a look at those.
>>
>> The links are from Daylight. I am neither involved with the company nor
>> their product.
>> http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
>> http://www.daylight.com/dayhtml/doc/theory/theory.thor.html
>>
>> Best regards,
>> - malitha
>>
>>
>> On Thu, Sep 14, 2017 at 12:43 AM, Wandré <wandrevel...@gmail.com> wrote:
>>
>>> Thanks for all the answers.
>>>
>>> Reading all answers, I think in something different... If the SMILES
>>> (Chem.MolToSmiles(mol,isomericSmiles=True)) and Inchi
>>> (Chem.MolToInchi(mol)) can generate the same value in different molecules,
>>> I will generate others descriptors (NumHDonors, NumHAcceptors, Ri
>>> ngCount, GetNumAtoms, TPSA, pyLabuteASA, MolWt, CalcNumRotatableBonds
>>> and MolLogP) to compare all the molecules that SMILES and Inchi are the
>>> same.
>>> If all this data are the same, I will generate the fingerprint (Atompair
>>> for exemple) and use Tanimoto coefficient and, if this value, when I
>>> compare two molecules, is 1, this molecules are the same.
>>>
>>> Where is my mistake (I think that is, one or more, mistakes)?
>>>
>>> Thanks!
>>>
>>> --
>>> Wandré Nunes de Pinho Veloso
>>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais -
>>> UFMG
>>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>>> Inteligência Computacional - UNIFEI
>>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>>
>>> 2017-09-13 14:19 GMT-03:00 Dimitri Maziuk <dmaz...@bmrb.wisc.edu>:
>>>
>>>> On 09/13/2017 11:46 AM, Markus Sitzmann wrote:
>>>> > The case that you have 3D information available for a molecule
>>>> dataset is rare, if you want it trustworthy it gets even worse than that.
>>>> And what is the point then to generate the configuration of a molecule
>>>> first if you can not trust that either?
>>>>
>>>> Veering further off to

Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Wandré
Thanks Malitha.
I choose this descriptors because I will store this on my database, so,
will be fast compare one molecule before insert them in database.
My worry now is if the RDKit will generate different SMILES or InChI in
same SDF molecule or equals in different molecules (molecules from RCSB
PDB, PubChem, ChemBL, for example).

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 16:22 GMT-03:00 Malitha Kabir <malitha12...@gmail.com>:

> Hi Wandré,
>
> It seems you already did intense research on it. Kindly accept my comments
> as an addition to your idea (not the answer you trying to find out). In my
> idea, categorizing molecules using it's descriptor should reduce
> computation time. RDKit currently offer calculation of about 200
> descriptors! So, a careful look up at those makes a lot of sense to me.
> Conceptually, descriptor matching should follow a sequence (I don't know
> what sequence would be ideal) - for example MolWt should match first (H
> contribution and ions should be taken into consideration here) and then
> subsequent matching of other descriptors (might be different while writing
> programs). There are a few reading materials on molecular fingerprint and
> database schema. You may have a look at those.
>
> The links are from Daylight. I am neither involved with the company nor
> their product.
> http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
> http://www.daylight.com/dayhtml/doc/theory/theory.thor.html
>
> Best regards,
> - malitha
>
>
> On Thu, Sep 14, 2017 at 12:43 AM, Wandré <wandrevel...@gmail.com> wrote:
>
>> Thanks for all the answers.
>>
>> Reading all answers, I think in something different... If the SMILES
>> (Chem.MolToSmiles(mol,isomericSmiles=True)) and Inchi
>> (Chem.MolToInchi(mol)) can generate the same value in different molecules,
>> I will generate others descriptors (NumHDonors, NumHAcceptors, Ri
>> ngCount, GetNumAtoms, TPSA, pyLabuteASA, MolWt, CalcNumRotatableBonds
>> and MolLogP) to compare all the molecules that SMILES and Inchi are the
>> same.
>> If all this data are the same, I will generate the fingerprint (Atompair
>> for exemple) and use Tanimoto coefficient and, if this value, when I
>> compare two molecules, is 1, this molecules are the same.
>>
>> Where is my mistake (I think that is, one or more, mistakes)?
>>
>> Thanks!
>>
>> --
>> Wandré Nunes de Pinho Veloso
>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>> Inteligência Computacional - UNIFEI
>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>
>> 2017-09-13 14:19 GMT-03:00 Dimitri Maziuk <dmaz...@bmrb.wisc.edu>:
>>
>>> On 09/13/2017 11:46 AM, Markus Sitzmann wrote:
>>> > The case that you have 3D information available for a molecule dataset
>>> is rare, if you want it trustworthy it gets even worse than that. And what
>>> is the point then to generate the configuration of a molecule first if you
>>> can not trust that either?
>>>
>>> Veering further off topic, do you even care in the first place? E.g. if
>>> your molecule always exists as a mixture of isomers, except in some
>>> megabuck-per-microgram painstakingly created reference samples, a
>>> 3D-based system will represent it as two distinct molecules. Whereas you
>>> want it represented as one.
>>>
>>> Last I looked PDB Ligand Expo had two different benzenes. Their software
>>> doesn't (didn't?) do the circle version so they don't have the third one.
>>>
>>> --
>>> Dimitri Maziuk
>>> Programmer/sysadmin
>>> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _

Re: [Rdkit-discuss] Non-redundant database of molecules (Wandr?)

2017-09-13 Thread Wandré
Why don't use the InChI function on RDKit?
Canonical SMILES cannot be generated by RDKit, correct?

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 15:57 GMT-03:00 Chris Swain <sw...@mac.com>:

> Hi,
>
> I’d use a text based version of the structure InChiKey or canonical SMILES
> it then becomes a easy task to do the comparison in Python
>
> I wrote a script to do this in Vortex but it should be easy to modify.
> https://www.macinchem.org/reviews/vortex/tut28/scripting_vortex28.php
>
>
> Cheers
>
> Chris
>
>
>
> Today's Topics:
>
>   1. Non-redundant database of molecules (Wandr?)
>
>
> --
>
> Message: 1
> Date: Wed, 13 Sep 2017 07:13:56 -0300
> From: Wandr? <wandrevel...@gmail.com>
> To: rdkit-discuss@lists.sourceforge.net
> Subject: [Rdkit-discuss] Non-redundant database of molecules
> Message-ID:
> <caemzefdrr5vsh1ohmm1vwd7g8xkdmtoukfsfdqnx4zyobla...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> My name is Wandr? and I'm from Brazil.
> I'm trying to do a big database of molecules, but, I want to eliminate all
> the redundant molecules before insert them in database.
> I want to know what is the best method to identify one molecule in RDKit.
> Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
> compare all molecules, one by one, before insert them in database (using
> Tanimoto)?
> This can be hard to do because my database will have lot of millions of
> molecules, so, compare one by one before insert is the only answer?
> Compare if the SMILES as already inserted is easy (text compare), but,
> compare fingerprint of molecule...
>
> If I really need to compare the fingerprint of molecule, how to store this
> data in PostgreSQL without use cartridge? I will generate the fingeprint
> (Atompair, for example) and store this fingerprint in database and compare
> all the fingerprints, one by one, before insert a now molecule. This
> fingerprint (Atompair) have lot of features, so, store this in relational
> database is expensive.
> It is possible?
>
> Thanks!
>
> --
> Wandr? Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avan?ado de Itabira-MG
> Doutorando em Bioinform?tica - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simula??o e
> Intelig?ncia Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biol?gicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinform?tica Estrutural da UFMG
> Laborat?rio de Bioinform?tica e Sistemas - LBS, DCC, UFMG
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> --
>
> Subject: Digest Footer
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> --
>
> End of Rdkit-discuss Digest, Vol 119, Issue 20
> **
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Wandré
Thanks for all the answers.

Reading all answers, I think in something different... If the SMILES
(Chem.MolToSmiles(mol,isomericSmiles=True)) and Inchi
(Chem.MolToInchi(mol)) can generate the same value in different molecules,
I will generate others descriptors
(NumHDonors, NumHAcceptors, RingCount, GetNumAtoms, TPSA, pyLabuteASA,
MolWt, CalcNumRotatableBonds
and MolLogP) to compare all the molecules that SMILES and Inchi are the
same.
If all this data are the same, I will generate the fingerprint (Atompair
for exemple) and use Tanimoto coefficient and, if this value, when I
compare two molecules, is 1, this molecules are the same.

Where is my mistake (I think that is, one or more, mistakes)?

Thanks!

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 14:19 GMT-03:00 Dimitri Maziuk <dmaz...@bmrb.wisc.edu>:

> On 09/13/2017 11:46 AM, Markus Sitzmann wrote:
> > The case that you have 3D information available for a molecule dataset
> is rare, if you want it trustworthy it gets even worse than that. And what
> is the point then to generate the configuration of a molecule first if you
> can not trust that either?
>
> Veering further off topic, do you even care in the first place? E.g. if
> your molecule always exists as a mixture of isomers, except in some
> megabuck-per-microgram painstakingly created reference samples, a
> 3D-based system will represent it as two distinct molecules. Whereas you
> want it represented as one.
>
> Last I looked PDB Ligand Expo had two different benzenes. Their software
> doesn't (didn't?) do the circle version so they don't have the third one.
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Wandré
Hi,

My name is Wandré and I'm from Brazil.
I'm trying to do a big database of molecules, but, I want to eliminate all
the redundant molecules before insert them in database.
I want to know what is the best method to identify one molecule in RDKit.
Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
compare all molecules, one by one, before insert them in database (using
Tanimoto)?
This can be hard to do because my database will have lot of millions of
molecules, so, compare one by one before insert is the only answer?
Compare if the SMILES as already inserted is easy (text compare), but,
compare fingerprint of molecule...

If I really need to compare the fingerprint of molecule, how to store this
data in PostgreSQL without use cartridge? I will generate the fingeprint
(Atompair, for example) and store this fingerprint in database and compare
all the fingerprints, one by one, before insert a now molecule. This
fingerprint (Atompair) have lot of features, so, store this in relational
database is expensive.
It is possible?

Thanks!

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss