2009/6/12 Dror <[email protected]>: > > Hi Mikkel, > > I wonder about the query performance, we have similar configuration > with only ~1M records in a single table. > and the client works with LIMIT 1000. > and some of the queries can take up to 2 minutes. > What do yuo mean by "an offset on a unique BIGINT"
That may be something about the db file size then? My base here is ~9M rows and ~13G db file size. Here follows the pattern we use for sorting: We basically store a bunch of records that we want to sort by modification time (mtime). One way to cursor through the db would be to issue queries with LIMIT/OFFSET - this however doesn't perform very well because the db has to discard the first OFFSET items - it would be better if we could use a UNIQUE index as offset into the db. The problem is that mtime (in ms) can not be assumed to be unique (we might insert >1 record/s), so it can not be used as a reliable offset. So what we do is that we've made the mtime column a BIGINT and then store a "salted time stamp", packing the system time together with a counter variable that is reset for each millisecond. This way we can put a UNIQUE index on the mtime column. We need a 64 bit BIGINT rather 32bit INT to make room for the salt bits. These salted timestamps are ofcourse required to still sort correctly, like normal timestamps, it requires some bit-fiddling, but is not a hard task[1]. Using the mtime of the last result as the offset of the query for the next page we only need a LIMIT 1000 in the query (and not an OFFSET). So the query looks something like: SELECT * FROM records WHERE mtime>$last_mtime ORDERY BY mtime LIMIT 1000 This query is really, _really_ snappy, assuming you have a unique index on mtime. Cheers, Mikkel [1]: See https://gforge.statsbiblioteket.dk/plugins/scmsvn/viewcvs.php/trunk/Storage/src/dk/statsbiblioteket/summa/storage/database/UniqueTimestampGenerator.java?root=summa&view=markup for an implementation... You might need to register on the site to view the source - I know it sucks. We are moving to sourceforge, but it is taking some time... > On Jun 11, 8:28 am, Mikkel Kamstrup Erlandsen > <[email protected]> wrote: >> 2009/6/8 Lucky Luke <[email protected]>: >> >> >> >> > Hello, >> >> > I just found out about H2, and I was wondering how well it handles >> > large databases (e.g. 50Gb - 100Gb in size). We have a database that >> > has 2 dozen lookup tables that are small, and about half a dozen >> > tables that are potentially very large with several million rows. >> >> > Some just contain integer data, some contain varchar values for log >> > data. Data is mostly just inserted, and every now and then queried >> > with result sets of up to 2500 records. >> >> > Is this something that H2 would handle well, or is it not designed to >> > deal with large databases? >> >> We are running a setup with ~8M records in a single table and H2 >> doesn't complain at all. We iterate over the entire table with client >> side paging, ie. with LIMIT 500 and with an offset on a unique BIGINT >> column and can read about 5000 records/s this way last I checked. At >> one point we had some left outer joins in our paging query (so I'd say >> avoid those) but that gave quite a performance hit - I don't recall >> the exact numbers though. >> >> I am a happy camper at least :-) >> >> -- >> Cheers, >> Mikkel > > > -- Cheers, Mikkel --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/h2-database?hl=en -~----------~----~----~----~------~----~------~--~---
