On 2017-06-09 08:12, Alexis Parenty wrote:
Dear Greg and Brian,
Many thanks for your response. I was also thinking of your streaming
approach! I think the RAM of most machine would deal with lists of 100K
mol so we could put the threshold higher than 1000. Actually, I was
thinking to monitor
Yes Greg, this is what I am doing. You’re right, I did not think of the
possibility to build a list of mol from the shorter list and process each
of its mol with the mol of the longer list (which I would make on the
flight from the smiles). However, I wanted to store the longest list of
structures
Hi Alexis,
If I understand your use case correctly, you really don't need this level
of complication.
If you are comparing Q molecules to M molecules and M>>Q (in the discussion
so far Q = 1000, M = 50) and you only need to compare each of the Qs to
each of the Ms a single time, you can
What exactly are you doing?
Is this 1000x500k substructure queries or something different?
Brian Kelley
> On Jun 9, 2017, at 9:12 AM, Alexis Parenty
> wrote:
>
> Dear Greg and Brian,
> Many thanks for your response. I was also thinking of your streaming
Dear Greg and Brian,
Many thanks for your response. I was also thinking of your streaming
approach! I think the RAM of most machine would deal with lists of 100K mol
so we could put the threshold higher than 1000. Actually, I was thinking to
monitor the available RAM and only start processing the
While not multithreaded (yet) this is the use case of the filter catalog:
http://rdkit.blogspot.com/2016/04/changes-in-201603-release-filtercatalog.html?m=1
Look for the SmartsMatcher class in the blog.
It is a good idea to make this multithreaded as well, I'll add this as a
possible
Hi Alexis,
I would approach this by loading the 1000 queries into a list of molecules
and then "stream" the others past that (so that you never attempt to load
the full 500K set at once).
Here's a quick sketch of one way to do this:
In [4]: queries = [x for x in
7 matches
Mail list logo