Re: Tuning PostgreSQL for bulk imports

Kirk Strauser Sun, 07 Jan 2007 18:55:37 -0800

On Sunday 07 January 2007 19:22, Norberto Meijome wrote:

> I could be wrong and this only apply to generating indexes?


That's what we're doing now.  By dropping the table and recreating it, all the 
index maintenance gets deferred until one batch at the end (which is vastly 
faster in practice).  We also wrap the whole thing in a transaction so that 
select queries still have access to the old data until the instant the commit 
is finished, assuming that the import doesn't fail for some reason and get 
rolled back.

> Something also to consider is , do you *have* to import all that data every
> time? ie., can you create data partitions, assign a new table to each
> day ,create the appropriate rules, and then only dump from FoxPro the last
> day's info?

I wish.  First, we run the import hourly (more or less).  I save the output of 
the last hour's Foxpro->PostgreSQL conversion, and run a hand-written 
optimized diff against it.  If nothing has changed, that table gets skipped.  
If a reasonably small percentage of rows have changed (which is almost always 
the case), then I re-write it as a serious of deletes followed by a bulk 
import.  Basically, I patch the table.  It's nice to see a twenty million row 
table update reduced to a patch file 100 lines long.  Oh, for the record, my 
diff algorithm is written in Python and is still IO limited, even when 
several copies are running in parallel.

> if cpu is hitting to hard, you can always run nice pgsql while importing
> this - it will still take 100% cpu, but yield as soon as something else
> needs to do some work . Of course, this wont help if you are IO bound (i
> think? ). *are* you CPU bound ?

{Un,}fortunately, no.  The CPUs are still idle enough to get a lot of other 
processing done without slowing the import process.

> Let us know what solution you come up with, and good luck :)

Will do, and thanks!
-- 
Kirk Strauser

pgpkGwbXhcq1H.pgp
Description: PGP signature

Re: Tuning PostgreSQL for bulk imports

Reply via email to