2009/6/15 Dror <[email protected]>: > > Thanks Mikkel, > > I am trying to understand what is the difference between using a > regular ID (or even a simple counter that you are using during the > insert) and your timestamp. > We do have such ID and we do similar query. > In addition when we do some complex query that include a column that > is not indexed then we get the low performance.
Just for reference, I blogged about our approach here: http://sbdevel.wordpress.com/2009/06/15/efficient-sorting-and-iteration-on-large-databases/ We also have a case where we need to extract records where the 'base' column (which is a VARCHAR) is something specific. To make this efficient we added a composite index on (mtine,base) like so: CREATE UNIQUE INDEX mb ON records(mtime,base) And then we can efficiently query like this: SELECT * FROM records WHERE mtime>$last_mtime AND base='mybase' ORDER BY mtime LIMIT 1000 Best of luck! Mikkel > On Jun 15, 10:10 am, Mikkel Kamstrup Erlandsen > <[email protected]> wrote: >> 2009/6/12 Dror <[email protected]>: >> >> >> >> > Hi Mikkel, >> >> > I wonder about the query performance, we have similar configuration >> > with only ~1M records in a single table. >> > and the client works with LIMIT 1000. >> > and some of the queries can take up to 2 minutes. >> > What do yuo mean by "an offset on a unique BIGINT" >> >> That may be something about the db file size then? My base here is ~9M >> rows and ~13G db file size. >> >> Here follows the pattern we use for sorting: We basically store a >> bunch of records that we want to sort by modification time (mtime). >> One way to cursor through the db would be to issue queries with >> LIMIT/OFFSET - this however doesn't perform very well because the db >> has to discard the first OFFSET items - it would be better if we could >> use a UNIQUE index as offset into the db. The problem is that mtime >> (in ms) can not be assumed to be unique (we might insert >1 record/s), >> so it can not be used as a reliable offset. >> >> So what we do is that we've made the mtime column a BIGINT and then >> store a "salted time stamp", packing the system time together with a >> counter variable that is reset for each millisecond. This way we can >> put a UNIQUE index on the mtime column. We need a 64 bit BIGINT rather >> 32bit INT to make room for the salt bits. These salted timestamps are >> ofcourse required to still sort correctly, like normal timestamps, it >> requires some bit-fiddling, but is not a hard task[1]. >> >> Using the mtime of the last result as the offset of the query for the >> next page we only need a LIMIT 1000 in the query (and not an OFFSET). >> So the query looks something like: >> >> SELECT * FROM records WHERE mtime>$last_mtime ORDERY BY mtime LIMIT 1000 >> >> This query is really, _really_ snappy, assuming you have a unique >> index on mtime. >> >> Cheers, >> Mikkel >> >> [1]: >> Seehttps://gforge.statsbiblioteket.dk/plugins/scmsvn/viewcvs.php/trunk/S... >> for an implementation... You might need to register on the site to >> view the source - I know it sucks. We are moving to sourceforge, but >> it is taking some time... >> >> >> >> >> >> > On Jun 11, 8:28 am, Mikkel Kamstrup Erlandsen >> > <[email protected]> wrote: >> >> 2009/6/8 Lucky Luke <[email protected]>: >> >> >> > Hello, >> >> >> > I just found out about H2, and I was wondering how well it handles >> >> > large databases (e.g. 50Gb - 100Gb in size). We have a database that >> >> > has 2 dozen lookup tables that are small, and about half a dozen >> >> > tables that are potentially very large with several million rows. >> >> >> > Some just contain integer data, some contain varchar values for log >> >> > data. Data is mostly just inserted, and every now and then queried >> >> > with result sets of up to 2500 records. >> >> >> > Is this something that H2 would handle well, or is it not designed to >> >> > deal with large databases? >> >> >> We are running a setup with ~8M records in a single table and H2 >> >> doesn't complain at all. We iterate over the entire table with client >> >> side paging, ie. with LIMIT 500 and with an offset on a unique BIGINT >> >> column and can read about 5000 records/s this way last I checked. At >> >> one point we had some left outer joins in our paging query (so I'd say >> >> avoid those) but that gave quite a performance hit - I don't recall >> >> the exact numbers though. >> >> >> I am a happy camper at least :-) >> >> >> -- >> >> Cheers, >> >> Mikkel >> >> -- >> Cheers, >> Mikkel- Hide quoted text - >> >> - Show quoted text - > > > -- Cheers, Mikkel --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/h2-database?hl=en -~----------~----~----~----~------~----~------~--~---
