Re: [h2] Re: The OFFSET should be optimized in H2

Brett Ryan Tue, 24 May 2016 21:16:06 -0700

> On 25 May 2016, at 07:58, Silvio <[email protected]> wrote:
> 
>> On Friday, May 20, 2016 at 7:24:16 PM UTC+2, Brett Ryan wrote:
>> I have a real world use for needing to optimise either paginated or full 
>> select queries.
>> 
>> I'm presently using H2 as a transport database. What I'm doing is exporting 
>> data from database to an h2 database, copying the exported data to another 
>> machine then using that database to load the data back in on that machine. 
>> This method has been found to be very effective for smaller tables, though 
>> I'm presently facing a problem with a table that's just shy of 40 million 
>> rows. I've got the db built fine, though doing a select without a where 
>> clause just borks, as does any paginating with limit/offset.
>> 
>> 40 milion is actually conservative as I've only taken a sample size, if I 
>> were to process everything this would end up being several billion rows.
>> 
>> If you're wondering what the data is it's retail barcode scans.
> 
> Don't use OFFSET for that. Use an ordering on an indexed column (pk will do 
> fine) in combination with LIMIT.
> 
> In fact, NEVER use OFFSET, it is a non-feature.


Doesn't `select * from table' order on the primary key by default?

If I do `select * from table order by id' where id is the primary key, it never 
starts streaming the resultset. I tried leaving this running for 12 hours 
before killing it.

Using offset, at least it starts but then slows down the further it gets.

Pattern being used involves JdbcTemplate where srcJT and destJT are instances 
of JdbcTemplate src being the h2, dest being a SQL Server instance.

    List<Object[]> args = new ArrayList<>();
    LOG.debug("Inserting batched mode...");
    srcJT.query(select, rs -> {
            args.add(argsProducer.apply(rs));
            if (args.size() == batchSize) {
                LOG.debug("Committing batch {} -> {}", counts[1], counts[0]);
                destJT.batchUpdate(insert, args);
                counts[0] += args.size();
                counts[1]++;
                args.clear();
            }
        });
    if (args.size() > 0) {
        LOG.debug("Committing batch {} -> {}", counts[1], counts[0]);
        destJT.batchUpdate(insert, args);
        counts[0] += args.size();
        counts[1]++;
        args.clear();
    }

argsProducer is a function that takes a ResultSet and produces an array of 
arguments used for the batched insert.

The pattern works very well, and; the exact code is used to create the h2 
database in the first place from an OpenEdge database which does start 
streaming immediately from the select *.


-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] Re: The OFFSET should be optimized in H2

Reply via email to