On Feb 19, 2008, at 12:03 PM, Peter Maas wrote:

> Well I copied most of your examples.
> Please find it enclosed.
> I'm running it against our  10 mg stock (>250k structures).

Hmm, I ran it on OS X (2.2 GHz, 1GB RAM, JDK 1.5) and it processed  
277 relatively small molecules in 7 sec.

I did some rough testing of my own and I used a molecule from PubChem  
(CID = 52) which has 55 heavy atoms. It turns out that the  
polarizability code takes nearly a minute to run.

I tweaked the polarizability calculation so that now it takes 4sec to  
run, bringing the processing time for this molecule down to 4.9s.  
Also, the 277 SDF file I mentioned above now takes 3.1s

However it does still slow down for some large molecules (such as  
Pubchem CID 182) and I suspect that path length calculation could be  
improved. I'll look at that in a few days. Are your molecules very  
large?

In any case, the latest improvements are in SVN, so you should sync  
and recompile. Things should go faster.

> I like to give R a shot clustering it but I'm afraid R also will  
> not be up
> to it.

Well creating a 250K x 250K distance matrix will bring most machines  
to their knees, unless you have a very large amount of RAM. But you  
could look at methods like spectral clustering etc which can be more  
efficient for larger datasets

-------------------------------------------------------------------
Rajarshi Guha  <[EMAIL PROTECTED]>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
After an instrument has been assembled, extra components will be found
on the bench.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to