i just did some performance tests using DBD::Pg - here are quick
results.

i timed inserts of 100k records using the following three techniques:

1. row-at-a-time insert using a prepared insert statement.
2. "copy <table> from stdin", followed by repeated calls to putline.
3. "copy <table> from <tempfile>"

results:

1. 1053 records per second
2. 3225 records per second
3. 3448 records per second


general conclusion:

if you're in a situation where you're confident in the integrity of
your data, there's roughly a factor of 3x or so to be gained by
bypassing the overhead of executing repeated insert statements via DBI
and instead using the "backdoor approach" of shipping data directly to
the database using "copy from...".



details:

this was postgres running on solaris.

table had six fields - three varchar(32)'s, two integers, and a float.
string values inserted were short - ~5-8 bytes.

there was one unique index defined on the table.

the database server was running local to the machine executing the
script.

the temp file used for technique 3 was on a local filesystem (local to
both client and database server).

autocommit was off.  a single $dbh->commit was done in each test and
the time was included in the total time for each test.

time for perl to write out text records to temp file was included in
the total time for technique 3.



discussion:

your milage will undoubtedly vary due to, among other things:

 - number and types of fields and the sizes of the data being stored.

 - number of different indices in effect.  i suspect that the two
'copy <table> from ...' methods may do a single index update at the
end, where as repeated inserts probably update the indices each time.

i wonder if it might be possible to speed things up even further by
using binary format for copy...  that would result in fewer bytes
being exchanged between client and server and less work being done
generating and parsing strings.  perhaps i'll try this later.


David Duff
Director of Research
Sockeye Networks
[EMAIL PROTECTED]



> -----Original Message-----
> From: Jason E. Stewart [mailto:[EMAIL PROTECTED]]
> Sent: Friday, November 22, 2002 16:08
> To: [EMAIL PROTECTED]
> Subject: Re: DBD::Pg timings
>
>
> "Paul Boutros" <[EMAIL PROTECTED]> writes:
>
> > I'm having similar problems with DBD::ODBC and I wanted
> to clarify my
> > understanding of a couple of things you're mentioning here!
> >
> > > > On Thu, Nov 21, 2002 at 09:25:13AM -0700, Jason E.
> Stewart wrote:
> > > >> I'd be grateful if someone could give me a reality
> check. I have
> > > >> 250k rows I want to insert into Postgres using a
> simple Perl script
> > > >> and it's taking *forever*. According to my simple
> timings, It seems
> > > >> to be only capable of handling about 5,000
> rows/hr!!! This seems
> > > >> ridiculous. This is running on a pretty speedy dual
> processor P4,
> > > >> and it doesn't seem to have any trouble at all with
> big selects.
> > >
> > > Important questions:
> > > - Is AutoCommit on or off?
> >
> > Is it generally superior to keep AutoCommit *off* for large-scale
> > inserts?
>
> I just had a brief exchange with Tom Lane, one of the main Postgres
> developers about this, and he insisted that one should have
> autocommit
> off for better performance. And that the write aheah logging should
> not have any trouble with 250k rows.
>
> > > - Can you use the 'copy' command instead of 'insert'?
> Copy is good for
> > > large-scale batch inserts.
> >
> > Is this a DBD::Pg specific command, or is it some new
> addition to SQL?  I
> > didn't see it in the DBI documentation, that's why I ask.
>
> This is a Postgres-specific SQL command.
>
> jas.
>

Reply via email to