I've added HSQLDB and Derby to the test to do some more comparisons. The code is in https://github.com/kohsuke/many-db
My findings follow: First, we can safely eliminate Derby from the picture. It takes up about 4MB of heap for database (10K records of about 80bytes/record.) And it's painfully slow to insert. H2 and HSQLDB feels comparable speed-wise. Each database in H2 takes up about 720KB according to YourKit (or 420KB according to NetBeans Insane tool, which I suspect missed some other references from the thread?) HSQLDB uses 1.4MB a pop according to YourKit (or 2MB according to Insane.) So I think the choice of H2 was reasonable. I also liked that the code ships with debug symbols and source jars, making it very easy to debug and find out what's going on. 2012/12/8 Kohsuke Kawaguchi <k...@kohsuke.org> > Database servers do often host a lot of databases in it, so I don't think > having 1000s of independent DBs is beyond the design boundary. With proper > cache management, I don't think we'll ever have all 1000s of them open at > the same time, and for those few that would do, a few GB of heap isn't the > end of the world. > > That said, we can and should try a few more like HSQLDB to see if they > have different characteristics. It might be also possible to make some > quick improvements to H2 for some quick gains, for H2 probably isn't > designed for 1000s of independent DBs in one JVM. > > One benefit of SQL DB is the popularity and vast number of people who are > familiar with it, inluding users. For example, test data in DB would allow > users and other devs to come up with queries. It'll also make it easier for > existing plugin devs to use them. > > MapDB is a nice library on its own, but Map is by definition single index, > so I'm not sure it works for many typical use cases. Say test reports --- > we need to be able to query all failing ones for a given build as well as > all the past executions of a specific test case. > > > > > 2012/12/5 Jesse Glick <jgl...@cloudbees.com> > >> On 12/05/2012 06:37 AM, Kohsuke Kawaguchi wrote: >> >>> H2 database, when opened, takes up about 1MB in heap. >>> >> >> Seems excessive when typical jobs will have much less data than this that >> needs to be stored. >> >> I am still not convinced that using a SQL database for this kind of thing >> is appropriate. >> >> 1. Portability of SQL is a bit of a red herring because once you start >> using, say, H2 to store per-job data, you cannot casually switch to another >> DB without losing historical build records; and Jenkins would have to ship >> with _some_ DB plugin, or all plugins using the DB API would be broken. And >> for per-job DBs we are narrowing the field to those that are embeddable, >> which probably means just H2 in practice. >> >> 2. SQL databases are generally optimized for one slow-to-start instance, >> a small number of expensive connections, and maybe dozens of tables with >> lots of data. Whereas we need thousands of extremely cheap instances, each >> with one immediately available connection and a few tables with usually not >> so much data. The closer we can get to java.io.RandomAccessFile.<**init> >> the better. >> >> Is there any fully free (so not BDB-JE) DB which is pure Java, >> embeddable, supports some kind of indices, supports compact binary schemas >> (so not e.g. Lucene or the current wave of JSON DBs), and has a very simple >> client API once the connection is set up; while being openable from one or >> two disk files in say under a millisecond with no significant penalty >> beyond the file descriptor? MapDB [1] looks most promising so far. >> >> [1] https://github.com/jankotek/**mapdb<https://github.com/jankotek/mapdb> >> > > > > -- > Kohsuke Kawaguchi > -- Kohsuke Kawaguchi