Re: Large Database Performance

Dror Mon, 15 Jun 2009 02:41:21 -0700

Thanks Mikkel,

I am trying to understand what is the difference between using a
regular ID (or even a simple counter that you are using during the
insert) and your timestamp.
We do have such ID and we do similar query.
In addition when we do some complex query that include a column that
is not indexed then we get the low performance.
Regards
Dror




On Jun 15, 10:10 am, Mikkel Kamstrup Erlandsen
<[email protected]> wrote:
> 2009/6/12 Dror <[email protected]>:
>
>
>
> > Hi Mikkel,
>
> > I wonder about the query performance, we have similar configuration
> > with only ~1M records in a single table.
> > and the client works with LIMIT 1000.
> > and some of the queries can take up to 2 minutes.
> > What do yuo mean by "an offset on a unique BIGINT"
>
> That may be something about the db file size then? My base here is ~9M
> rows and ~13G db file size.
>
> Here follows the pattern we use for sorting: We basically store a
> bunch of records that we want to sort by modification time (mtime).
> One way to cursor through the db would be to issue queries with
> LIMIT/OFFSET - this however doesn't perform very well because the db
> has to discard the first OFFSET items - it would be better if we could
> use a UNIQUE index as offset into the db. The problem is that mtime
> (in ms) can not be assumed to be unique (we might insert >1 record/s),
> so it can not be used as a reliable offset.
>
> So what we do is that we've made the mtime column a BIGINT and then
> store a "salted time stamp", packing the system time together with a
> counter variable that is reset for each millisecond. This way we can
> put a UNIQUE index on the mtime column. We need a 64 bit BIGINT rather
> 32bit INT to make room for the salt bits. These salted timestamps are
> ofcourse required to still sort correctly, like normal timestamps, it
> requires some bit-fiddling, but is not a hard task[1].
>
> Using the mtime of the last result as the offset of the query for the
> next page we only need a LIMIT 1000 in the query (and not an OFFSET).
> So the query looks something like:
>
>   SELECT * FROM records WHERE mtime>$last_mtime ORDERY BY mtime LIMIT 1000
>
> This query is really, _really_ snappy, assuming you have a unique
> index on mtime.
>
> Cheers,
> Mikkel
>
> [1]: 
> Seehttps://gforge.statsbiblioteket.dk/plugins/scmsvn/viewcvs.php/trunk/S...
> for an implementation... You might need to register on the site to
> view the source - I know it sucks. We are moving to sourceforge, but
> it is taking some time...
>
>
>
>
>
> > On Jun 11, 8:28 am, Mikkel Kamstrup Erlandsen
> > <[email protected]> wrote:
> >> 2009/6/8 Lucky Luke <[email protected]>:
>
> >> > Hello,
>
> >> > I just found out about H2, and I was wondering how well it handles
> >> > large databases (e.g. 50Gb - 100Gb in size). We have a database that
> >> > has 2 dozen lookup tables that are small, and about half a dozen
> >> > tables that are potentially very large with several million rows.
>
> >> > Some just contain integer data, some contain varchar values for log
> >> > data. Data is mostly just inserted, and every now and then queried
> >> > with result sets of up to 2500 records.
>
> >> > Is this something that H2 would handle well, or is it not designed to
> >> > deal with large databases?
>
> >> We are running a setup with ~8M records in a single table and H2
> >> doesn't complain at all. We iterate over the entire table with client
> >> side paging, ie. with LIMIT 500 and with an offset on a unique BIGINT
> >> column and can read about 5000 records/s this way last I checked. At
> >> one point we had some left outer joins in our paging query (so I'd say
> >> avoid those) but that gave quite a performance hit - I don't recall
> >> the exact numbers though.
>
> >> I am a happy camper at least :-)
>
> >> --
> >> Cheers,
> >> Mikkel
>
> --
> Cheers,
> Mikkel- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Large Database Performance

Reply via email to