Hi Matt,

maybe squeeze these two lines

zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if
x is not None]

zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims]

into one:

zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in
Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x is not None]

because zims keeps the whole file in memory for no good reason  :-)
(is that sdf.gz big?)

Markus

On Thu, Jul 17, 2014 at 12:43 AM, Matthew Lardy <mla...@gmail.com> wrote:
> Hi Igor,
>
> Thanks!  Maybe I am a throwback, but I prefer the command line to a GUI.
> Still I'll give it a whirl!  :)
>
> If you are handling millions of molecules without issue; then my Python
> skills are really, really, rusty.  Or, I shouldn't be using Python to handle
> this much data.  :)
>
> Thanks for the info!
> Matt
>
>
> On Wed, Jul 16, 2014 at 3:31 PM, Igor Filippov <igor.v.filip...@gmail.com>
> wrote:
>>
>> Matthew,
>>
>> Two lines of shameless self-promotion:
>> This is exactly the kind of problem for Diversity Genie -
>> http://www.diversitygenie.com/
>> It is using RDKit library underneath, but wraps it in a simple, easy to
>> use GUI front-end.
>>
>> Best regards,
>> Igor
>>
>>
>> On Wed, Jul 16, 2014 at 6:18 PM, Matthew Lardy <mla...@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I have been playing with the diversity selection in RDKit.  I am running
>>> through a set of ~26,000 molecules to pick a set of 200 diverse molecules.
>>> I saw some examples of how to do this in Python (my variant of their script
>>> below), but the memory consumption is massive.  I burned through ~15GB of
>>> memory before I killed it off.  Is this about what others have seen, or
>>> should I move to doing this in C++ or Java (assuming that others have seen a
>>> significantly lower level of memory consumption)?
>>>
>>> Here is the script:
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> from rdkit import DataStructs
>>> import gzip
>>> from rdkit.Chem import Draw
>>> from rdkit.SimDivFilters import rdSimDivPickers
>>>
>>> zims = [x for x in Chem.ForwardSDMolSupplier(gzip.open('a.sdf.gz')) if x
>>> is not None]
>>>
>>> zims_fps=[AllChem.GetMorganFingerprintAsBitVect(x,2) for x in zims]
>>>
>>> dm=[]
>>> for i,fp in enumerate(zims_fps[:26000]):     # only 1000 in the demo (in
>>> the interest of time)
>>>
>>> dm.extend(DataStructs.BulkTanimotoSimilarity(fp,zims_fps[1+1:26000],returnDistance=True))
>>> dm = array(dm)
>>> picker = rdSimDivPickers.MaxMinPicker()
>>> ids = picker.Pick(dm,26000,200)
>>> list(ids[:200])
>>>
>>>
>>> Thanks in advance!
>>> Matt
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want fast and easy access to all the code in your enterprise? Index and
>>> search up to 200,000 lines of code with a free copy of Black Duck
>>> Code Sight - the same software that powers the world's largest code
>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>> http://p.sf.net/sfu/bds
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to