On 2017-06-10 07:42, Chris Swain wrote:
This sounds like the situation where a database might be a better
option, tuned to store fingerprints in RAM?
The issue is how much programming time it will take, how much that time
is worth, and how many times the solution will be reused. A clever
-
>
> Message: 1
> Date: Fri, 9 Jun 2017 16:28:09 +0200
> From: Alexis Parenty <alexis.parenty.h...@gmail.com>
> To: Greg Landrum <greg.land...@gmail.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.so
On 2017-06-09 08:12, Alexis Parenty wrote:
Dear Greg and Brian,
Many thanks for your response. I was also thinking of your streaming
approach! I think the RAM of most machine would deal with lists of 100K
mol so we could put the threshold higher than 1000. Actually, I was
thinking to monitor
Yes Greg, this is what I am doing. You’re right, I did not think of the
possibility to build a list of mol from the shorter list and process each
of its mol with the mol of the longer list (which I would make on the
flight from the smiles). However, I wanted to store the longest list of
structures
Hi Alexis,
If I understand your use case correctly, you really don't need this level
of complication.
If you are comparing Q molecules to M molecules and M>>Q (in the discussion
so far Q = 1000, M = 50) and you only need to compare each of the Qs to
each of the Ms a single time, you can
What exactly are you doing?
Is this 1000x500k substructure queries or something different?
Brian Kelley
> On Jun 9, 2017, at 9:12 AM, Alexis Parenty
> wrote:
>
> Dear Greg and Brian,
> Many thanks for your response. I was also thinking of your streaming
Dear Greg and Brian,
Many thanks for your response. I was also thinking of your streaming
approach! I think the RAM of most machine would deal with lists of 100K mol
so we could put the threshold higher than 1000. Actually, I was thinking to
monitor the available RAM and only start processing the
While not multithreaded (yet) this is the use case of the filter catalog:
http://rdkit.blogspot.com/2016/04/changes-in-201603-release-filtercatalog.html?m=1
Look for the SmartsMatcher class in the blog.
It is a good idea to make this multithreaded as well, I'll add this as a
possible
Hi Alexis,
I would approach this by loading the 1000 queries into a list of molecules
and then "stream" the others past that (so that you never attempt to load
the full 500K set at once).
Here's a quick sketch of one way to do this:
In [4]: queries = [x for x in
9 matches
Mail list logo