Re: update by join

vips Sat, 30 Jan 2010 11:44:36 -0800

Hi,

I am sorry, I must have clicked the Reply to author link by mistake.

The merge statement as suggest should work functionally, but is an
extremely slow operation.
It took 30 seconds for about 30K rows.
So about 1K rows/sec.
Thats really slow for typical use cases.
For the same rows, only inserts or only updates process approx 8K rows/
sec.

Anything that can make it much more faster?
Will be needing to process at least 8K rows/sec.

If we are not able to process merge 8K rows/sec, then we cannot use H2
for the current requirements.

I have been trying other options as well.

1. create the Totals table pre-populated with all possible values of
ID and with Total column as 0.
   then instead of merge, always use update to update the Totals table
with values from the Increments table.

  But this will bloat the Totals table size since the Totals table
will contain all the rows all the time, irrespective of whether an ID
value will be present in the Increments table or not.

2. just keep inserting always by just grouping whatever is available
in the Increments table at one time. That means that the Totals table
does not have any primary key at all.
So,
a. during the first run if increments table contained -
drop table totals;
drop table increments;
create table totals(id int, total decimal);

create table increments(id int, inc decimal);

insert into increments values (1,10), (1, 1), (1, -2), (2, 10), (2,
-3);

b. then insert these into the totals table using -
  insert into totals (select id, sum(inc) from increments group by
id) ;
  select * from totals ;

output -
ID      TOTAL
1       9
2       7

c. once a run is over, then -
truncate table increments ;

d. Then, during second run, if increments table contained -
insert into increments values(1, 5), (1, -3), (1, 7), (2, 9), (2, -4),
(3,10);

e. then do another insert into the totals table using -
  insert into totals (select id, sum(inc) from increments group by
id) ;
  select * from totals ;

output -
ID      TOTAL
1       9
2       7
1       9
2       5
3       10

f. Continue truncating increments table and continue inserting into
totals table.
The totals table now contains multiple rows for the same ID and hence
is not exactly summarized data.

g. then after a predetermined period, when it is time to dump the
totals statistics into a csvfile, then -
call csvwrite('totals.csv', 'select id, sum(total) from totals group
by id') ;

This shoud write -
ID      SUM(TOTAL)
1       18
2       12
3       10

h. Thus, the sum happens multiple times, during each insert into
totals table and then while selecting and writing from totals table.

The problem with this approach is also that the Totals table size will
be much larger since there will be a lot of rows with the same id.

The Merge option would have been best if it could really merge and not
just replace. In the merge operation while doing the internal updates,
if it could perform some mathematical functions as well then it could
make things a lot more faster and easier to use.

Any ideas/comments on the approaches?
Seems kind of stuck right now since these approaches do not seem to be
taking us any closer to 8K rows/sec.

On Jan 24, 4:46 pm, Thomas Mueller <[email protected]>
wrote:
> Hi,
>
> Vipul by mistake send the mail below to me instead of to the group.
> See my reply below.
>
>
>
> On Thu, Jan 21, 2010 at 5:32 PM, vips <[email protected]> wrote:
> > Hi,
>
> > I am writing on this thread since I am trying to write something
> > similar to what Brian mentioned on Dec 5.
>
> > Using the same example,
> > Lets say there are 2 tables - Totals and Increments.
> > Totals always stored the cumulative data and Increments is
> > continuously uploaded with incremental data.
> > After the data from Increments is merged into Totals, data from the
> > Increments table is removed/truncated.
>
> > Lets say Totals has a column called total which is sum of all the
> > increments till date for a particular id.
> > Hence, while merging the data the already existing values in the total
> > column of the Totals table need to be summed up / added to the new
> > values obtained from the Increments table's inc column.
>
> > Hence, what is actually required is something like -
> > update Totals set total = total + (select inc from Increments where
> > Increments.id = Totals.id)
>
> > If this update does not succeed or returns 0 rows updated, then
> > Insert into Totals (id, total) (select id, inc from Increments)
>
> > Notice that the update operation is using some existing values of the
> > Totals table to do some mathematical operation with the newer values
> > received from Increments table.
>
> > Can the above be done using theMERGEstatement of H2?
> > I am not sure, since theMERGEstatement does not provide a way to
> > provide an expression using the existing values of the merged table.
>
> > Could something like the following work ? -
> >MERGEinto totals (id, total) key (id) values (select id, totals.total
> > + Increments.inc from Increments)
>
> > Such cases would come in more practical use.
> > Mostly updated columns will need to be added, subtracted, multiplied,
> > averaged, etc while doing an update.
>
> > As Thomas rightly pointed out the currentMERGEcommand is more like a
> > MySQL REPLACE command,
> > rather than really a practical way of "Update if-exists else Insert".
>
> > If it is not directly available as of now - can this be made as a
> > feature request?
>
> > Regards,
> > Vipul.
>
> Hi,
>
> > Can the above be done using theMERGEstatement of H2?
>
> What about:
>
> drop all objects;
> create table totals(id int primary key, total decimal);
> create table increments(id int, inc decimal);
> insert into totals values(1, 10);
> insert into increments values(1, 1), (1, -2), (2, 10), (2, -3);mergeinto 
> totals(id, total)
>  select i.id, ifnull(t.total, 0) + sum(i.inc)
>  from increments i
>  left outerjointotals t on i.id = t.id
>  group by t.id;
> select * from totals;
>
> Regards,
> Thomas

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: update by join

Reply via email to