2009/6/12 Dror <[email protected]>:
>
> Hi Mikkel,
>
> I wonder about the query performance, we have similar configuration
> with only ~1M records in a single table.
> and the client works with LIMIT 1000.
> and some of the queries can take up to 2 minutes.
> What do yuo mean by "an offset on a unique BIGINT"

That may be something about the db file size then? My base here is ~9M
rows and ~13G db file size.

Here follows the pattern we use for sorting: We basically store a
bunch of records that we want to sort by modification time (mtime).
One way to cursor through the db would be to issue queries with
LIMIT/OFFSET - this however doesn't perform very well because the db
has to discard the first OFFSET items - it would be better if we could
use a UNIQUE index as offset into the db. The problem is that mtime
(in ms) can not be assumed to be unique (we might insert >1 record/s),
so it can not be used as a reliable offset.

So what we do is that we've made the mtime column a BIGINT and then
store a "salted time stamp", packing the system time together with a
counter variable that is reset for each millisecond. This way we can
put a UNIQUE index on the mtime column. We need a 64 bit BIGINT rather
32bit INT to make room for the salt bits. These salted timestamps are
ofcourse required to still sort correctly, like normal timestamps, it
requires some bit-fiddling, but is not a hard task[1].

Using the mtime of the last result as the offset of the query for the
next page we only need a LIMIT 1000 in the query (and not an OFFSET).
So the query looks something like:

  SELECT * FROM records WHERE mtime>$last_mtime ORDERY BY mtime LIMIT 1000

This query is really, _really_ snappy, assuming you have a unique
index on mtime.

Cheers,
Mikkel

[1]: See 
https://gforge.statsbiblioteket.dk/plugins/scmsvn/viewcvs.php/trunk/Storage/src/dk/statsbiblioteket/summa/storage/database/UniqueTimestampGenerator.java?root=summa&view=markup
for an implementation... You might need to register on the site to
view the source - I know it sucks. We are moving to sourceforge, but
it is taking some time...




> On Jun 11, 8:28 am, Mikkel Kamstrup Erlandsen
> <[email protected]> wrote:
>> 2009/6/8 Lucky Luke <[email protected]>:
>>
>>
>>
>> > Hello,
>>
>> > I just found out about H2, and I was wondering how well it handles
>> > large databases (e.g. 50Gb - 100Gb in size). We have a database that
>> > has 2 dozen lookup tables that are small, and about half a dozen
>> > tables that are potentially very large with several million rows.
>>
>> > Some just contain integer data, some contain varchar values for log
>> > data. Data is mostly just inserted, and every now and then queried
>> > with result sets of up to 2500 records.
>>
>> > Is this something that H2 would handle well, or is it not designed to
>> > deal with large databases?
>>
>> We are running a setup with ~8M records in a single table and H2
>> doesn't complain at all. We iterate over the entire table with client
>> side paging, ie. with LIMIT 500 and with an offset on a unique BIGINT
>> column and can read about 5000 records/s this way last I checked. At
>> one point we had some left outer joins in our paging query (so I'd say
>> avoid those) but that gave quite a performance hit - I don't recall
>> the exact numbers though.
>>
>> I am a happy camper at least :-)
>>
>> --
>> Cheers,
>> Mikkel
> >
>



-- 
Cheers,
Mikkel

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to