Re: Question/observations on purged rows and insert rates

Mike Matrigali Fri, 06 Aug 2010 10:55:42 -0700

Bergquist, Brett wrote:

Thanks for the response Mike, next week I will put together a test program that I can use to attach to a Jira issue on this.This info on the PURGE_ROWS option is go. This should probably go into the manual as to why one would want to use it. Now I understand better.

In my case, the table in question has mostly integer data types with no 
clobs/blobs.  Rows are being added to the table in increasing time order and 
older rows (older than 30 days for example) are being deleted.


Just for a little more background, for the test that I did, only this table is 
updated, database and logging are being done on the same file system, multiple 
threads are inserting rows into the table simultaneously.  After inserting a 
few million rows and observing the insert rate, the insertion process was 
suspended, 1.5 million rows were removed by a single thread, and the insertion 
process was restarted.  No shutdown of the database occurred.  Resuming the 
insertion process saw the dramatic decrease in insertion speed.

A shutdown on the database was done, a compact was performed using the embedded 
driver to access the database, and the above test was performed.  After the 
compact, the speed, returned to normal.

I agree with your approach on using separate tables.  More research lead me to find that this 
"partitioning" is supported native in some databases.  I prefer not to move to another database as 
so far over the last 4 or 5 years, Derby has worked well for us in a "zero administration" kind of 
way.  I would hate to give this up.  I probably will look into doing this "partitioning" via 
procedure and convention.

Thanks for your help.

Brett

great, a repro is always best. I would like to see derby handle thiskind of application. I am pretty sure the default space reclamation

is built well to handle this kind load, at least from the space reuse
side (it more has problems when apps
pick and choose rows to delete and those that don't have subsequent
inserts after the delete).  The problem may just be that deletes cause

work for the db (cpu, log I/O, and database I/O) and this work isaffecting the immediate subsequent

inserts.

Once you have the repro, the first thing to look at will be if the

async background work of the deletes is affecting the immediatefollowing inserts. So should see if after waiting some amount of time

after the delete if the insert rate goes back up.

I understand that this is not going to useful for your application, butit may shed more light on what is going on. Could you run your existing

test past the point of doing the delete and then wait some amount of
time before restarting the inserts.  I am not sure exactly how long but

would guess an hour would be way more than enough. I just want tounderstand if the issue is the background handling of the deletesaffecting the inserts, or if the empty space in the db really causes

the issue.

You could also just actively monitor the process and see when the

cpu and I/O on the system goes dorment after the delete. It isdefinitely the case that after the massive delete there will be work

that has been queued to background to be completed after the user
executes the commit for transaction that does the deletes.  In general
this work can take as much time as the actual delete (maybe even more
as the initial delete just sets a bit to mark the rows deleted so not
much logging - where the space reclamation can do a lot of I/O and cpu
to move rows around and mark the pages free).

-----Original Message-----
From: Mike Matrigali [mailto:[email protected]]Sent: Friday, August 06, 2010 1:00 PM
To: [email protected]
Subject: Re: Question/observations on purged rows and insert rates

Bergquist, Brett wrote:
I posted the following to the users list but saw no response, so Ithought I would post here as it may be more relevant to developers.
After some testing, it seems that Derby is reusing the space of deletedrecords before allocating more space from the file system. If this isthe case then what use does the call:
callsyscs_util.syscs_inplace_compress_table(SCHEMA,TABLE, 1, 0, 0);
have? Basically what does the PURGE_ROWS option do that is above andbeyond what is being done by Derby already to reuse the space of deletedrecords?
If you have no clobs/blobs in your table then derby by default will onlyreclaim space once it thinks it can get a whole page. So it waits
until the LAST row on a page has been deleted before attempting to
reclaim the space and make a full page available for subsequent use.
The purge option of compress instead trys to reclaim the space of every
single deleted row in the table.

 From what you have writing it seems like most of your rows are likely
to be in order in the file and you tend to delete them in chunks that
are in order, so I would assume that the default space reclamation would
work well in your case.
Also after testing we are seeing the following. With a database with nodeleted rows, my test application is inserting about 150 records/secondinto a table. I let this run for about 2 million records and the insertrate is consistent. Now I purge out 1.5 million records and run thetest again.
Does the above sentence mean you sql delete the first 1.5 million records?
The insert rate is now about 25 records/second.  Running
the above compress with the PURGE_ROWS option and rerun the test andstill about 25 records/second. Run full SYSCS_UTIL.SYSCS_COMPRESS_TABLEand rerun the test and the insert rate is back to 150 records/second.
I am not sure what is going on.  The best thing you could do if you want
more help is to post a repro with as much detail as possible, this just
answers a lot a questions about what you are doing rather than having
to guess (where shutdowns happen, page size, blobs/clobs?, log onseparated disk or not, where commits happen, inserts single threaded,deletes single threaded, ...). We don't need your real data, but tableand index ddl that match your app would be best - with a program thatgenerated dummy data for inserts that reproduces the issue. Go aheadand log performance JIRA issue
if you can reproduce this is a simple java program.
The reduced insert rate because of deleted records is very much aproblem. We have a table that gets about 700K records inserted per dayand purges out 30 days old data at about 700K records per day. This hasa great effect on our insert rate. Why the big hit because of deletedrecords and can anything other than a compress help? This process hasno downtime so running a compress can only be done maybe once a month.
I will just note that space reclamation of deleted records does generate
"background" work.  And this background work can affect concurrent
insert peformance.
If your application can handle the complications with using multipletables, I would suggest you go that way. That is the approach I have
seen most applications like this use across a wide range of databases.
Inserts then are going to be best case performace, and reclaiming the
space can be done "online" without affecting the data you might need
to access.  Dropping an entire table is always going to be less overhead
on the system then deleting the same number of rows from a table, as in
the later case each row will be logged, and an number of cpu actions are
taken for manipulating the bytes on the pages for each delete.  In the
case of a drop we simply log the drop table and delete the file from
the file system.
You could use table functions or views to make it easier to query the"current" set of data.
Thanks for an feedback.
Brett

Re: Question/observations on purged rows and insert rates

Reply via email to