On Saturday 04 December 2010 20:10:09 Thomas Strunz wrote:
> you are right. If you use these java-based RDBMS like HSQLDB, the
> complete result set is loaded into memory. I think the same would
> be true for MySQL but not for oracle or SQL Server. I read it in
> hsqldb manual, hsqldb does not have a "server side cursor". I
> tried this by using the LIMIT clause, it works put perfroms very
> poorly. I will try it out, but doing thousand(s) of select
> statement that each return 1 result does not seem very nice at
> first though but probably will perfrom better.

With MySQL it is possible to fetch the results row-by-row without 
loading the whole result set into memory. This can be achieved by 
setting setFetchSize(Integer.MIN_VALUE). I'm using this technique with 
an approach similar to the one discussed here: When a row (contains an 
id and a molecule in mol format) is retrieved, it is immediately added 
to an ArrayBlockingQueue of capacity 500. Another thread takes it from 
the queue, parses the mol file, performs aromaticity detection (using 
CDK) and finally performs a subgraph isomorphism test (using an 
algorithm that I have developed to avoid the slow UIT of the CDK).

The application requires less than 30 MB of memory when verifying a 
candidate set with more than 30.000 molecules (this does not include 
the memory consumed by the MySQL server).

Besides saving memory, the row-by-row fetching allows to start the 
verification process immediately when the first row arrives, which 
improves the performance for large candidate sets. However, I don't 
think this will work with different server software in the same way.


As already mentioned, the bottleneck is parsing the mol file and 
detecting aromaticity with CDK. I think it should be much faster to 
store a simple graph representation (with aromatic bond labels) in the 
database. Such a representation can be much more compact than the mol 
format and it is easy to build a simple graph data structure that is 
sufficient for subgraph isomorphism testing.


My source code is available at
<http://scaffoldhunter.svn.sourceforge.net/viewvc/scaffoldhunter/trunk/src/subsearch/>


Best regards
Nils


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to