On Saturday 04 December 2010 20:10:09 Thomas Strunz wrote: > you are right. If you use these java-based RDBMS like HSQLDB, the > complete result set is loaded into memory. I think the same would > be true for MySQL but not for oracle or SQL Server. I read it in > hsqldb manual, hsqldb does not have a "server side cursor". I > tried this by using the LIMIT clause, it works put perfroms very > poorly. I will try it out, but doing thousand(s) of select > statement that each return 1 result does not seem very nice at > first though but probably will perfrom better.
With MySQL it is possible to fetch the results row-by-row without loading the whole result set into memory. This can be achieved by setting setFetchSize(Integer.MIN_VALUE). I'm using this technique with an approach similar to the one discussed here: When a row (contains an id and a molecule in mol format) is retrieved, it is immediately added to an ArrayBlockingQueue of capacity 500. Another thread takes it from the queue, parses the mol file, performs aromaticity detection (using CDK) and finally performs a subgraph isomorphism test (using an algorithm that I have developed to avoid the slow UIT of the CDK). The application requires less than 30 MB of memory when verifying a candidate set with more than 30.000 molecules (this does not include the memory consumed by the MySQL server). Besides saving memory, the row-by-row fetching allows to start the verification process immediately when the first row arrives, which improves the performance for large candidate sets. However, I don't think this will work with different server software in the same way. As already mentioned, the bottleneck is parsing the mol file and detecting aromaticity with CDK. I think it should be much faster to store a simple graph representation (with aromatic bond labels) in the database. Such a representation can be much more compact than the mol format and it is easy to build a simple graph data structure that is sufficient for subgraph isomorphism testing. My source code is available at <http://scaffoldhunter.svn.sourceforge.net/viewvc/scaffoldhunter/trunk/src/subsearch/> Best regards Nils ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user