Re: [Rdkit-discuss] RDkit in Python vs. on PostgreSQL?

2020-03-02 Thread Thomas Strunz
Von: Deepti Gupta via Rdkit-discuss Gesendet: Mittwoch, 26. Februar 2020 09:46 An: rdkit-discuss@lists.sourceforge.net ; Tim Dudgeon Betreff: Re: [Rdkit-discuss] RDkit in Python vs. on PostgreSQL? Hi Tim, Thank you! I'll be more detailed in my post, sorry about

Re: [Rdkit-discuss] RDkit in Python vs. on PostgreSQL?

2020-02-26 Thread Tim Dudgeon
Well, as I mentioned previously the big difference is because from Python you are iterating through the molecules, calculating the fingerprints and then doing a comparison on the fingerprints. Whereas in the PostgreSQL cartridge the fingerprints are already generated and indexed so the search

Re: [Rdkit-discuss] RDkit in Python vs. on PostgreSQL?

2020-02-26 Thread Deepti Gupta via Rdkit-discuss
Hi Tim, Thank you! I'll be more detailed in my post, sorry about that. As this was a PoC, I had a spark cluster with 2 worker nodes with 4 vCPUs with disk size 500GB and memory 15GB on Google Cloud. I timed the response against 2 million data points consisting of Chembl id, Smile structures. 

Re: [Rdkit-discuss] RDkit in Python vs. on PostgreSQL?

2020-02-25 Thread Tim Dudgeon
I think you need to explain what benchmarks you are running and what is really meant by "faster". And what hardware (for Spark how many nodes, how big; for PostgreSQL what size server, what settings esp. the shared_buffers setting). A very obvious critique of what you reported is that what you

[Rdkit-discuss] RDkit in Python vs. on PostgreSQL?

2020-02-25 Thread Deepti Gupta via Rdkit-discuss
Hi Gurus, I'm absolutely new to Chem-informatics domain. I've been assigned a PoC where I've to compare RDKit in Python and RDKit on PostgreSQL. I've installed both and am trying some hands-on exercises to understand the differences. What I've understood that the structure searches are slower