I've added HSQLDB and Derby to the test to do some more comparisons. The
code is in https://github.com/kohsuke/many-db

My findings follow:

First, we can safely eliminate Derby from the picture. It takes up about
4MB of heap for database (10K records of about 80bytes/record.) And it's
painfully slow to insert.

H2 and HSQLDB feels comparable speed-wise. Each database in H2 takes up
about 720KB according to YourKit (or 420KB according to NetBeans Insane
tool, which I suspect missed some other references from the thread?) HSQLDB
uses 1.4MB a pop according to YourKit (or 2MB according to Insane.)

So I think the choice of H2 was reasonable. I also liked that the code
ships with debug symbols and source jars, making it very easy to debug and
find out what's going on.


2012/12/8 Kohsuke Kawaguchi <k...@kohsuke.org>

> Database servers do often host a lot of databases in it, so I don't think
> having 1000s of independent DBs is beyond the design boundary. With proper
> cache management, I don't think we'll ever have all 1000s of them open at
> the same time, and for those few that would do, a few GB of heap isn't the
> end of the world.
>
> That said, we can and should try a few more like HSQLDB to see if they
> have different characteristics. It might be also possible to make some
> quick improvements to H2 for some quick gains, for H2 probably isn't
> designed for 1000s of independent DBs in one JVM.
>
> One benefit of SQL DB is the popularity and vast number of people who are
> familiar with it, inluding users. For example, test data in DB would allow
> users and other devs to come up with queries. It'll also make it easier for
> existing plugin devs to use them.
>
> MapDB is a nice library on its own, but Map is by definition single index,
> so I'm not sure it works for many typical use cases. Say test reports ---
> we need to be able to query all failing ones for a given build as well as
> all the past executions of a specific test case.
>
>
>
>
> 2012/12/5 Jesse Glick <jgl...@cloudbees.com>
>
>> On 12/05/2012 06:37 AM, Kohsuke Kawaguchi wrote:
>>
>>> H2 database, when opened, takes up about 1MB in heap.
>>>
>>
>> Seems excessive when typical jobs will have much less data than this that
>> needs to be stored.
>>
>> I am still not convinced that using a SQL database for this kind of thing
>> is appropriate.
>>
>> 1. Portability of SQL is a bit of a red herring because once you start
>> using, say, H2 to store per-job data, you cannot casually switch to another
>> DB without losing historical build records; and Jenkins would have to ship
>> with _some_ DB plugin, or all plugins using the DB API would be broken. And
>> for per-job DBs we are narrowing the field to those that are embeddable,
>> which probably means just H2 in practice.
>>
>> 2. SQL databases are generally optimized for one slow-to-start instance,
>> a small number of expensive connections, and maybe dozens of tables with
>> lots of data. Whereas we need thousands of extremely cheap instances, each
>> with one immediately available connection and a few tables with usually not
>> so much data. The closer we can get to java.io.RandomAccessFile.<**init>
>> the better.
>>
>> Is there any fully free (so not BDB-JE) DB which is pure Java,
>> embeddable, supports some kind of indices, supports compact binary schemas
>> (so not e.g. Lucene or the current wave of JSON DBs), and has a very simple
>> client API once the connection is set up; while being openable from one or
>> two disk files in say under a millisecond with no significant penalty
>> beyond the file descriptor? MapDB [1] looks most promising so far.
>>
>> [1] https://github.com/jankotek/**mapdb<https://github.com/jankotek/mapdb>
>>
>
>
>
> --
> Kohsuke Kawaguchi
>



-- 
Kohsuke Kawaguchi

Reply via email to