Hi Alexis,

I would approach this by loading the 1000 queries into a list of molecules
and then "stream" the others past that (so that you never attempt to load
the full 500K set at once).

Here's a quick sketch of one way to do this:

In [4]: queries = [x for x in Chem.ForwardSDMolSupplier('mols.1000.sdf') if
x is not None]

In [5]: matches = []

In [6]: for m in Chem.ForwardSDMolSupplier('./znp.50k.sdf'):
   ...:     if m is None:
   ...:         continue
   ...:     matches.append([m.HasSubstructMatch(q) for q in queries])
   ...:



Brian has some thoughts on making this particular use case easier/faster
(in particular by adding multi-threading support), so maybe there will be
something in the next release there.

I hope this helps,
-greg


On Sun, Jun 4, 2017 at 10:25 PM, Alexis Parenty <
alexis.parenty.h...@gmail.com> wrote:

> Dear RDKit community,
>
> I need to screen for substructure relationships between two sets of
> structures (1 000 X 500 000): I thought I should build two lists of mol
> objects from SMILES, but I keep having a memory error when the second list
> reaches 300 000 mol. All my RAM (12G) gets consumed along with all my
> virtual memory.
>
> Do I really have to compromise on speed and make mol object on the flight
> from two lists of SMILES? Is there another memory efficient way to store
> mol object?
>
> Best,
>
> Alexis
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to