On 1/18/06, Robert Hicks <[EMAIL PROTECTED]> wrote:
> mark d wrote:
> > On 1/18/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >> ---- mark d <[EMAIL PROTECTED]> wrote:
> >>> This optimization might have made some sense decades over a serial line, 
> >>> but
> >>> not today.  It is saving 320 single-packet round trips, which is an
> >>> infintesimal amount of traffic.
> >> Well, traffic wise, you're right.  But, it also saves parsing all those 
> >> rows out of the
> >> DB and through the DBI; this way the only process that has to work with 
> >> the data
> >> is the oracle engine.  Use DBI where appropriate...  Right tool for the 
> >> job, and all that...
> >
> > I have no idea what "parsing all those rows out of the DB and through the 
> > DBI"
> > means, though it sounds like you think row data is copied from the DB to
> > the DBI in the course of a sql DELETE statement (that would be a fundamental
> > misunderstanding of SQL and DBI.)
> >
> > In any case, you're mistaken that the PL/SQL saves anything but a few 
> > packets.
> > To be explicit, the proposed PL/SQL was:
> >
> >> declare
> >> rows_processed number;
> >> batch_size number := 50000;
> >> begin
> >> rows_processed := batch_size;
> >> while rows_processed = batch_size loop
> >>     delete from table where rownum < batch_size + 1;
> >>     rows_processed := SQL%ROWCOUNT;
> >>     commit;
> >> end loop;
> >> end;
> >
> > The equivalent DBI code is:
> >
> > $sth = $dbh->prepare("delete from TABLE where rownum < 50001");
> > while ( ($deleted = $sth->execute) > 0) {
> >     printf "deleted $deleted\n";
> >
> >
> > The only difference between the two is a few network packets.
>
> Is that the only difference? I read it, and I could be wrong, that your
> SQL will stop when rownum hits 50001, while his will keep going in
> batches of 50000 until there isn't any more to delete given the delete
> statement.

No. Look more closely. The execute is inside a while loop.

The execute() method returns the number of "affected" rows. For a DELETE,
this means the number of deleted rows.  The conditional test inside the loop
checks to see when more than zero rows have been deleted, so it repeats
the delete until zero rows are left.

Here's a real-life sample:

code:

$batch = 50001;

$sth = $dbh->prepare("delete from SNAFU where rownum < $batch");

printf "Rows in table = %d\n",
    $dbh->selectall_arrayref("select count(*) from SNAFU")->[0]->[0];

$t = time();
while (($deleted = $sth->execute) > 0) {

    printf "batch %d: deleted %d rows in %f secs\n",
            ++$i, $deleted, time()-$t;

    $t = time();
}


Output:


Rows in table = 700000
batch 1: deleted 50000 rows in 0.850589 secs
batch 2: deleted 50000 rows in 0.840419 secs
batch 3: deleted 50000 rows in 0.931219 secs
batch 4: deleted 50000 rows in 0.835611 secs
batch 5: deleted 50000 rows in 0.841162 secs
batch 6: deleted 50000 rows in 0.834864 secs
batch 7: deleted 50000 rows in 3.877935 secs
batch 8: deleted 50000 rows in 0.871310 secs
batch 9: deleted 50000 rows in 0.826238 secs
batch 10: deleted 50000 rows in 0.825013 secs
batch 11: deleted 50000 rows in 0.829521 secs
batch 12: deleted 50000 rows in 0.838740 secs
batch 13: deleted 50000 rows in 0.830472 secs
batch 14: deleted 50000 rows in 0.881580 secs
Done



> Yours is a simple delete. I need to break up my deletes into manageable
> sizes, do the delete, do the next batch, etc.

Note that unless you have a meaningful basis for, and definition of,
"manageable",
this is all a complete waste of time anyway.  (I originally only
chimed in to note
the pointless "optimization", but I guess I'll comment on the original
question too :)

"large" is highly relative.  16 million is "small" to me.  It might be
large for a
MS Access rolodex, but it's certainly not large by any objective measure.

> On the delete side I have about 16 million records that will be deleted.
> I have two of those to do. I have one update that changes a few values
> on about 3.1 million records (not the whole table).

Unless you have good reason to believe it would be a problem, just issue
a single SQL statement and don't waste your time fooling around with
these hacks.

Mark

Reply via email to