Re: Large Database Performance

Mikkel Kamstrup Erlandsen Mon, 15 Jun 2009 05:45:35 -0700

2009/6/15 Dror <[email protected]>:
>
> Thanks Mikkel,
>
> I am trying to understand what is the difference between using a
> regular ID (or even a simple counter that you are using during the
> insert) and your timestamp.
> We do have such ID and we do similar query.
> In addition when we do some complex query that include a column that
> is not indexed then we get the low performance.


Just for reference, I blogged about our approach here:
http://sbdevel.wordpress.com/2009/06/15/efficient-sorting-and-iteration-on-large-databases/

We also have a case where we need to extract records where the 'base'
column (which is a VARCHAR) is something specific. To make this
efficient we added a composite index on (mtine,base) like so:

  CREATE UNIQUE INDEX mb ON records(mtime,base)

And then we can efficiently query like this:

  SELECT * FROM records WHERE mtime>$last_mtime AND base='mybase'
ORDER BY mtime LIMIT 1000

Best of luck!
Mikkel


> On Jun 15, 10:10 am, Mikkel Kamstrup Erlandsen
> <[email protected]> wrote:
>> 2009/6/12 Dror <[email protected]>:
>>
>>
>>
>> > Hi Mikkel,
>>
>> > I wonder about the query performance, we have similar configuration
>> > with only ~1M records in a single table.
>> > and the client works with LIMIT 1000.
>> > and some of the queries can take up to 2 minutes.
>> > What do yuo mean by "an offset on a unique BIGINT"
>>
>> That may be something about the db file size then? My base here is ~9M
>> rows and ~13G db file size.
>>
>> Here follows the pattern we use for sorting: We basically store a
>> bunch of records that we want to sort by modification time (mtime).
>> One way to cursor through the db would be to issue queries with
>> LIMIT/OFFSET - this however doesn't perform very well because the db
>> has to discard the first OFFSET items - it would be better if we could
>> use a UNIQUE index as offset into the db. The problem is that mtime
>> (in ms) can not be assumed to be unique (we might insert >1 record/s),
>> so it can not be used as a reliable offset.
>>
>> So what we do is that we've made the mtime column a BIGINT and then
>> store a "salted time stamp", packing the system time together with a
>> counter variable that is reset for each millisecond. This way we can
>> put a UNIQUE index on the mtime column. We need a 64 bit BIGINT rather
>> 32bit INT to make room for the salt bits. These salted timestamps are
>> ofcourse required to still sort correctly, like normal timestamps, it
>> requires some bit-fiddling, but is not a hard task[1].
>>
>> Using the mtime of the last result as the offset of the query for the
>> next page we only need a LIMIT 1000 in the query (and not an OFFSET).
>> So the query looks something like:
>>
>>   SELECT * FROM records WHERE mtime>$last_mtime ORDERY BY mtime LIMIT 1000
>>
>> This query is really, _really_ snappy, assuming you have a unique
>> index on mtime.
>>
>> Cheers,
>> Mikkel
>>
>> [1]: 
>> Seehttps://gforge.statsbiblioteket.dk/plugins/scmsvn/viewcvs.php/trunk/S...
>> for an implementation... You might need to register on the site to
>> view the source - I know it sucks. We are moving to sourceforge, but
>> it is taking some time...
>>
>>
>>
>>
>>
>> > On Jun 11, 8:28 am, Mikkel Kamstrup Erlandsen
>> > <[email protected]> wrote:
>> >> 2009/6/8 Lucky Luke <[email protected]>:
>>
>> >> > Hello,
>>
>> >> > I just found out about H2, and I was wondering how well it handles
>> >> > large databases (e.g. 50Gb - 100Gb in size). We have a database that
>> >> > has 2 dozen lookup tables that are small, and about half a dozen
>> >> > tables that are potentially very large with several million rows.
>>
>> >> > Some just contain integer data, some contain varchar values for log
>> >> > data. Data is mostly just inserted, and every now and then queried
>> >> > with result sets of up to 2500 records.
>>
>> >> > Is this something that H2 would handle well, or is it not designed to
>> >> > deal with large databases?
>>
>> >> We are running a setup with ~8M records in a single table and H2
>> >> doesn't complain at all. We iterate over the entire table with client
>> >> side paging, ie. with LIMIT 500 and with an offset on a unique BIGINT
>> >> column and can read about 5000 records/s this way last I checked. At
>> >> one point we had some left outer joins in our paging query (so I'd say
>> >> avoid those) but that gave quite a performance hit - I don't recall
>> >> the exact numbers though.
>>
>> >> I am a happy camper at least :-)
>>
>> >> --
>> >> Cheers,
>> >> Mikkel
>>
>> --
>> Cheers,
>> Mikkel- Hide quoted text -
>>
>> - Show quoted text -
> >
>



-- 
Cheers,
Mikkel

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Large Database Performance

Reply via email to