Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Jean-Marc Nuzillard

Each molecule has the _Name I gave it.
Thank you for suggesting to interleave molecule indexes (or names)
with warning messages redirected to unbuffered stderr. It works.
I will use the trick together with Lukas (Thanks!) idea of redirecting 
stderr

for in-script processing.

All the best,

Jean-Marc

Le 21/01/2019 à 21:04, Dimitri Maziuk via Rdkit-discuss a écrit :

On 1/21/19 1:42 PM, Jean-Marc Nuzillard wrote:


             sys.stderr.write("Bad: %s\n" % (mol.GetProp("_Name"),))
I know which bond has a problem but I still do not know in which molecule.

Are you sure they all have _Name's? I'd just print the count outside of
the try/catch block and ignore ones not followed by the warning message.
(And run with #!/usr/bin/python -u and/or flush sys.stdout/stderr on
every iteration for good measure.)



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Dimitri Maziuk via Rdkit-discuss
On 1/21/19 1:42 PM, Jean-Marc Nuzillard wrote:

>             sys.stderr.write("Bad: %s\n" % (mol.GetProp("_Name"),))

> I know which bond has a problem but I still do not know in which molecule.

Are you sure they all have _Name's? I'd just print the count outside of
the try/catch block and ignore ones not followed by the warning message.
(And run with #!/usr/bin/python -u and/or flush sys.stdout/stderr on
every iteration for good measure.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] DeleteSubstructs using an MCS object

2019-01-21 Thread Paolo Tosco

Hi Jenke,


The reason why the resulting molecule contains no atoms is that the 
3,5-dichlorophenyl substituent matches twice the 3-chlorophenyl 
substructure. If you look at the deleteSubstructs() code in 
ChemTransforms.cpp you may see that the atoms deleted from the main 
structures are the union of all possible matches (two in this case). 
Therefore, both the 3- and the 5-chloro substituents are deleted, and as 
all other atoms are part of the MCS, you are left with an empty molecule.



If you wish to delete matches one at a time, rather than deleting the 
union of all matches, you may use the DeleteSubstructs() Python 
implementation that I drafted in this gist:


https://gist.github.com/ptosco/1b6bc727ddee32b4d411cf5c2aea7291

(disclaimer: I wrote it very quickly and tested it only with your 
example, so I don't guarantee it is bug-free)


This DeleteSubstructs() implementation returns a list of all structures 
obtained by deleting the matched substructures one at a time. As I see 
you are using 3D structures, the code deletes all orphaned hydrogens and 
caps the fragments remaining after deleting substructures with 
hydrogens. If you don't care about 3D coordinates you may want to 
uniquify results based on canonical SMILES and, for example in the case 
of lig12, keep the remaining HCl fragment only once.


Another observation is that the PDB reader implemented in the RDKit 
autoconnects atoms based on proximity, but does not guess bond orders, 
so your structures are not chemically correct as they are all 
single-bonded even if you have carbonyls, aromatic rings, etc.
In my gist I used Flare to perceive bond orders as I was running the 
code in the Flare Jupyter Notebook; you may also use other 
cheminformatics toolkit for that purpose.


I hope the above helps; please feel free to get back to me off-list if 
something is not clear.


Cheers,
p.

On 01/21/19 15:51, SCHEEN Jenke wrote:


Hi all,


I'm trying to remove the MCS between two molecules (attached, 02.pdb 
and 12.pdb) using rdFMCS.FindMCS and AllChem.DeleteSubstructs using 
the following code:




#

from rdkit import Chem
from rdkit.Chem import AllChem, rdmolfiles, rdFMCS

#

# load molecules:
lig02_pdb = open("02.pdb", 'r').read()
lig12_pdb = open("12.pdb", 'r').read()

lig02_mol = rdmolfiles.MolFromPDBBlock(lig02_pdb)
lig12_mol = rdmolfiles.MolFromPDBBlock(lig12_pdb)

# make list of molecules to map the MCS to:
perturbation_pair = []

perturbation_pair.append(lig02_mol)
perturbation_pair.append(lig12_mol)

MCS_object = rdFMCS.FindMCS(perturbation_pair, completeRingsOnly=True)
MCS_SMARTS = Chem.MolFromSmarts(MCS_object.smartsString)

# remove MCS from each molecule:
lig02_stripped = AllChem.DeleteSubstructs(lig02_mol, MCS_SMARTS)
lig12_stripped = AllChem.DeleteSubstructs(lig12_mol, MCS_SMARTS)

# print SMILES of each stripped molecule:
print("lig02: " + str(Chem.MolToSmiles(lig02_stripped)))
print("lig12: " + str(Chem.MolToSmiles(lig12_stripped)))

#print(Chem.MolToMolBlock(MCS_SMARTS),file=open('./MCS.mol','w+'))

#


I attached the MCS.mol file as well. The lig12.pdb contains an extra 
Cl atom, and lig12_stripped should thus contain a single Cl atom after 
deletion of the MCS substructure (the MCS substructure is equal to 
lig02.pdb). When running the script it actually contains 0 atoms.



I haven't been able to locate the source of the issue in the 
AllChem.DeleteSubstructs documentation, does anyone have a suggestion?



I'm using a conda install of rdkit (2018.09.1.0).



Best,

Jenke
--
University of Edinburgh
David Brewster road
Edinburgh, EH9 3FJ
United Kingdom
-

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Jean-Marc Nuzillard

Adapting from https://github.com/rdkit/rdkit/issues/642 I wrote:

    reader = Chem.SDMolSupplier(file_sdf, sanitize = False)
    for mol in reader:
        try:
            Chem.SanitizeMol(mol)
        except:
            sys.stderr.write("Bad: %s\n" % (mol.GetProp("_Name"),))
            continue

A good point is that this code does not produce any warning, and that it 
does

when I remove the "sanitize = False" option.
However Chem.SanitizeMol(mol) never raises an error,
unless something is missing in my try/except statements.

The warning message I want to track is:
[19:58:01] Conflicting single bond directions around double bond at 
index 27.
[19:58:01]   BondStereo set to STEREONONE and single bond directions set 
to NONE.

I know which bond has a problem but I still do not know in which molecule.

Jean-Marc

PS. Markus, sorry for the duplicated message.

Le 21/01/2019 à 14:55, Markus Sitzmann a écrit :

Maybe this helps (at least, it is from Greg):

https://github.com/rdkit/rdkit/issues/642

Markus

On Mon, Jan 21, 2019 at 2:25 PM Jean-Marc Nuzillard 
mailto:jm.nuzill...@univ-reims.fr>> wrote:


My problem is more to know which molecules cause problems
than avoiding the printing of warning messages in the console window.
I am looking for an option that would turn warnings into errors,
if any.

Jean-Marc



Le 21/01/2019 à 13:44, Stephen O'hagan a écrit :
> I've had similar problems; none of the claimed methods to switch
off RDKit logging of warnings has worked for me.
>
> I ended up just re-directing stderr when running the script like
this:
>
> python myfile.py  2> myErrorLog.txt
>
> 
> Dr. Steve O'Hagan,
>
>
> -Original Message-
> From: Jean-Marc Nuzillard [mailto:jm.nuzill...@univ-reims.fr
]
> Sent: 21 January 2019 12:33
> To: RDKit Discuss mailto:rdkit-discuss@lists.sourceforge.net>>
> Subject: [Rdkit-discuss] Warning as error
>
> Dear all,
>
> The minimalist python code:
>   reader = Chem.SDMolSupplier('my_file.sdf')
>   for mol in reader:
>       pass
>
> gives me warning messages when run on a particular SD file.
> How can I simply run a specific action for the molecules that
cause problem, possibly using  try/catch statements?
> Best,
>
> Jean-Marc
>
>
> --
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
>
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/ICMR
> http://eos.univ-reims.fr/LSD/CSNteam.html
>
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée
par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net

> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


-- 
Jean-Marc Nuzillard

Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] DeleteSubstructs using an MCS object

2019-01-21 Thread SCHEEN Jenke
Hi all,


I'm trying to remove the MCS between two molecules (attached, 02.pdb and 
12.pdb) using rdFMCS.FindMCS and AllChem.DeleteSubstructs using the following 
code:


#


from rdkit import Chem
from rdkit.Chem import AllChem, rdmolfiles, rdFMCS

#

# load molecules:
lig02_pdb = open("02.pdb", 'r').read()
lig12_pdb = open("12.pdb", 'r').read()

lig02_mol = rdmolfiles.MolFromPDBBlock(lig02_pdb)
lig12_mol = rdmolfiles.MolFromPDBBlock(lig12_pdb)

# make list of molecules to map the MCS to:
perturbation_pair = []

perturbation_pair.append(lig02_mol)
perturbation_pair.append(lig12_mol)

MCS_object = rdFMCS.FindMCS(perturbation_pair, completeRingsOnly=True)
MCS_SMARTS = Chem.MolFromSmarts(MCS_object.smartsString)

# remove MCS from each molecule:
lig02_stripped = AllChem.DeleteSubstructs(lig02_mol, MCS_SMARTS)
lig12_stripped = AllChem.DeleteSubstructs(lig12_mol, MCS_SMARTS)

# print SMILES of each stripped molecule:
print("lig02: " + str(Chem.MolToSmiles(lig02_stripped)))
print("lig12: " + str(Chem.MolToSmiles(lig12_stripped)))

#print(Chem.MolToMolBlock(MCS_SMARTS),file=open('./MCS.mol','w+'))

#


I attached the MCS.mol file as well. The lig12.pdb contains an extra Cl atom, 
and lig12_stripped should thus contain a single Cl atom after deletion of the 
MCS substructure (the MCS substructure is equal to lig02.pdb). When running the 
script it actually contains 0 atoms.


I haven't been able to locate the source of the issue in the 
AllChem.DeleteSubstructs documentation, does anyone have a suggestion?


I'm using a conda install of rdkit (2018.09.1.0).



Best,

Jenke
--
University of Edinburgh
David Brewster road
Edinburgh, EH9 3FJ
United Kingdom
-

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.


12.pdb
Description: 12.pdb


02.pdb
Description: 02.pdb


MCS.pdb
Description: MCS.pdb
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Dividing inputstream over threads

2019-01-21 Thread Dmitri Maziuk via Rdkit-discuss
On Mon, 21 Jan 2019 09:43:48 +0100
Markus Sitzmann  wrote:
 
> There is no need for objects with SQLAlchemy, SQLAlchemy's Core and
> its expression language is pretty excellent without objects ...

I spent weeks last year rewriting code that I myself wrote back when I
believed that... When I wrote it originally, as I was getting deeper
in, SQLAlchemy changed my mind.

-- 
Dmitri Maziuk 


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Lukas Pravda
Hi Jean-Marc,

Just a thought, but SDMolSupplier has a lazy eval, if I am not mistaken. 
Technically you should get all the rdkit warnings and errors at the time of 
processing that bit of the sdf file. You can always read the stderror output, 
parse it and throw exception every time a 'funny' molecule comes in.

I use a routine similar to this:

from io import StringIO
import sys
import rdkit

saved_std_err = sys.stderr
log = sys.stderr = StringIO()
rdkit.Chem.WrapLogs()

reader = Chem.SDMolSupplier('my_file.sdf')
   for mol in reader:   
error_msgs = log.getvalue()

# check error_msgs content if it there are any particular errors and 
act accordingly, erhaps even flush the stream

sys.stderr = saved_std_err

Lukas

On 21/01/2019, 13:24, "Jean-Marc Nuzillard"  wrote:

Chem.SDMolSupplier




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Dividing inputstream over threads

2019-01-21 Thread Peter St. John
Another option is dask (https://docs.dask.org/en/latest/). I've used
`map_partitions` from dask to bulk convert a column of smiles strings into
various computed properties. You could then output to a CSV or other
database file.

-- Peter

On Mon, Jan 21, 2019 at 1:45 AM Markus Sitzmann 
wrote:

> > SQLalchemy creates a fairly specific ecosystem that you have to buy
> > into for it to make sense. When you don't have objects, only a table
> > of properties, OR mapper is just bloat.
>
> There is no need for objects with SQLAlchemy, SQLAlchemy's Core and its
> expression language is pretty excellent without objects ...
>
> >With parallel processing your bottleneck is going to be database
> >inserts. One option is write out CSV file(s) from each thread/job,
> >concatenate them in the final node, and then bulk-import into the
> >database: typically CSV (or other such format) bulk import is orders
> >of magnitude faster than inserting one SQL statement at a time.
>
> ... and bulk-inserts of Python data types into the database.
>
> Markus
>
> On Sun, Jan 20, 2019 at 9:17 PM Dmitri Maziuk via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> On Sun, 20 Jan 2019 12:03:50 +0100
>> Shojiro Shibayama  wrote:
>>
>> > ... I guess SQLalchemy
>> > in python might be good, but I'm not sure. Hope that you'll find out
>> > a good library of SQL OR mapper for python.
>>
>> SQLalchemy creates a fairly specific ecosystem that you have to buy
>> into for it to make sense. When you don't have objects, only a table
>> of properties, OR mapper is just bloat.
>>
>> With parallel processing your bottleneck is going to be database
>> inserts. One option is write out CSV file(s) from each thread/job,
>> concatenate them in the final node, and then bulk-import into the
>> database: typically CSV (or other such format) bulk import is orders
>> of magnitude faster than inserting one SQL statement at a time.
>>
>> --
>> Dmitri Maziuk 
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Jean-Marc Nuzillard

My problem is more to know which molecules cause problems
than avoiding the printing of warning messages in the console window.
I am looking for an option that would turn warnings into errors, if any.

Jean-Marc



Le 21/01/2019 à 13:44, Stephen O'hagan a écrit :

I've had similar problems; none of the claimed methods to switch off RDKit 
logging of warnings has worked for me.

I ended up just re-directing stderr when running the script like this:

python myfile.py  2> myErrorLog.txt


Dr. Steve O'Hagan,
  


-Original Message-
From: Jean-Marc Nuzillard [mailto:jm.nuzill...@univ-reims.fr]
Sent: 21 January 2019 12:33
To: RDKit Discuss 
Subject: [Rdkit-discuss] Warning as error

Dear all,

The minimalist python code:
      reader = Chem.SDMolSupplier('my_file.sdf')
      for mol in reader:
          pass

gives me warning messages when run on a particular SD file.
How can I simply run a specific action for the molecules that cause problem, 
possibly using  try/catch statements?
Best,

Jean-Marc


--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Stephen O'hagan
I've had similar problems; none of the claimed methods to switch off RDKit 
logging of warnings has worked for me.

I ended up just re-directing stderr when running the script like this:

python myfile.py  2> myErrorLog.txt


Dr. Steve O'Hagan,
 

-Original Message-
From: Jean-Marc Nuzillard [mailto:jm.nuzill...@univ-reims.fr] 
Sent: 21 January 2019 12:33
To: RDKit Discuss 
Subject: [Rdkit-discuss] Warning as error

Dear all,

The minimalist python code:
     reader = Chem.SDMolSupplier('my_file.sdf')
     for mol in reader:
         pass

gives me warning messages when run on a particular SD file.
How can I simply run a specific action for the molecules that cause problem, 
possibly using  try/catch statements?
Best,

Jean-Marc


--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Warning as error

2019-01-21 Thread Jean-Marc Nuzillard

Dear all,

The minimalist python code:
    reader = Chem.SDMolSupplier('my_file.sdf')
    for mol in reader:
        pass

gives me warning messages when run on a particular SD file.
How can I simply run a specific action for the molecules that cause problem,
possibly using  try/catch statements?
Best,

Jean-Marc


--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Dividing inputstream over threads

2019-01-21 Thread Markus Sitzmann
> SQLalchemy creates a fairly specific ecosystem that you have to buy
> into for it to make sense. When you don't have objects, only a table
> of properties, OR mapper is just bloat.

There is no need for objects with SQLAlchemy, SQLAlchemy's Core and its
expression language is pretty excellent without objects ...

>With parallel processing your bottleneck is going to be database
>inserts. One option is write out CSV file(s) from each thread/job,
>concatenate them in the final node, and then bulk-import into the
>database: typically CSV (or other such format) bulk import is orders
>of magnitude faster than inserting one SQL statement at a time.

... and bulk-inserts of Python data types into the database.

Markus

On Sun, Jan 20, 2019 at 9:17 PM Dmitri Maziuk via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> On Sun, 20 Jan 2019 12:03:50 +0100
> Shojiro Shibayama  wrote:
>
> > ... I guess SQLalchemy
> > in python might be good, but I'm not sure. Hope that you'll find out
> > a good library of SQL OR mapper for python.
>
> SQLalchemy creates a fairly specific ecosystem that you have to buy
> into for it to make sense. When you don't have objects, only a table
> of properties, OR mapper is just bloat.
>
> With parallel processing your bottleneck is going to be database
> inserts. One option is write out CSV file(s) from each thread/job,
> concatenate them in the final node, and then bulk-import into the
> database: typically CSV (or other such format) bulk import is orders
> of magnitude faster than inserting one SQL statement at a time.
>
> --
> Dmitri Maziuk 
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss