Bergquist, Brett wrote:
Thanks for the response Mike, next week I will put together a test program that I can use to attach to a Jira issue on this. This info on the PURGE_ROWS option is go. This should probably go into the manual as to why one would want to use it. Now I understand better.
In my case, the table in question has mostly integer data types with no 
clobs/blobs.  Rows are being added to the table in increasing time order and 
older rows (older than 30 days for example) are being deleted.

Just for a little more background, for the test that I did, only this table is 
updated, database and logging are being done on the same file system, multiple 
threads are inserting rows into the table simultaneously.  After inserting a 
few million rows and observing the insert rate, the insertion process was 
suspended, 1.5 million rows were removed by a single thread, and the insertion 
process was restarted.  No shutdown of the database occurred.  Resuming the 
insertion process saw the dramatic decrease in insertion speed.

A shutdown on the database was done, a compact was performed using the embedded 
driver to access the database, and the above test was performed.  After the 
compact, the speed, returned to normal.

I agree with your approach on using separate tables.  More research lead me to find that this 
"partitioning" is supported native in some databases.  I prefer not to move to another database as 
so far over the last 4 or 5 years, Derby has worked well for us in a "zero administration" kind of 
way.  I would hate to give this up.  I probably will look into doing this "partitioning" via 
procedure and convention.

Thanks for your help.

Brett
great, a repro is always best. I would like to see derby handle this kind of application. I am pretty sure the default space reclamation
is built well to handle this kind load, at least from the space reuse
side (it more has problems when apps
pick and choose rows to delete and those that don't have subsequent
inserts after the delete).  The problem may just be that deletes cause
work for the db (cpu, log I/O, and database I/O) and this work is affecting the immediate subsequent
inserts.

Once you have the repro, the first thing to look at will be if the
async background work of the deletes is affecting the immediate following inserts. So should see if after waiting some amount of time
after the delete if the insert rate goes back up.

I understand that this is not going to useful for your application, but it may shed more light on what is going on. Could you run your existing
test past the point of doing the delete and then wait some amount of
time before restarting the inserts.  I am not sure exactly how long but
would guess an hour would be way more than enough. I just want to understand if the issue is the background handling of the deletes affecting the inserts, or if the empty space in the db really causes
the issue.

You could also just actively monitor the process and see when the
cpu and I/O on the system goes dorment after the delete. It is definitely the case that after the massive delete there will be work
that has been queued to background to be completed after the user
executes the commit for transaction that does the deletes.  In general
this work can take as much time as the actual delete (maybe even more
as the initial delete just sets a bit to mark the rows deleted so not
much logging - where the space reclamation can do a lot of I/O and cpu
to move rows around and mark the pages free).

-----Original Message-----
From: Mike Matrigali [mailto:[email protected]] Sent: Friday, August 06, 2010 1:00 PM
To: [email protected]
Subject: Re: Question/observations on purged rows and insert rates

Bergquist, Brett wrote:
I posted the following to the users list but saw no response, so I thought I would post here as it may be more relevant to developers.

After some testing, it seems that Derby is reusing the space of deleted records before allocating more space from the file system. If this is the case then what use does the call:

call syscs_util.syscs_inplace_compress_table(SCHEMA,TABLE, 1, 0, 0);

have? Basically what does the PURGE_ROWS option do that is above and beyond what is being done by Derby already to reuse the space of deleted records?
If you have no clobs/blobs in your table then derby by default will only reclaim space once it thinks it can get a whole page. So it waits
until the LAST row on a page has been deleted before attempting to
reclaim the space and make a full page available for subsequent use.
The purge option of compress instead trys to reclaim the space of every
single deleted row in the table.

 From what you have writing it seems like most of your rows are likely
to be in order in the file and you tend to delete them in chunks that
are in order, so I would assume that the default space reclamation would
work well in your case.
Also after testing we are seeing the following. With a database with no deleted rows, my test application is inserting about 150 records/second into a table. I let this run for about 2 million records and the insert rate is consistent. Now I purge out 1.5 million records and run the test again.
Does the above sentence mean you sql delete the first 1.5 million records?
The insert rate is now about 25 records/second.  Running
the above compress with the PURGE_ROWS option and rerun the test and still about 25 records/second. Run full SYSCS_UTIL.SYSCS_COMPRESS_TABLE and rerun the test and the insert rate is back to 150 records/second.
I am not sure what is going on.  The best thing you could do if you want
more help is to post a repro with as much detail as possible, this just
answers a lot a questions about what you are doing rather than having
to guess (where shutdowns happen, page size, blobs/clobs?, log on separated disk or not, where commits happen, inserts single threaded, deletes single threaded, ...). We don't need your real data, but table and index ddl that match your app would be best - with a program that generated dummy data for inserts that reproduces the issue. Go ahead and log performance JIRA issue
if you can reproduce this is a simple java program.
The reduced insert rate because of deleted records is very much a problem. We have a table that gets about 700K records inserted per day and purges out 30 days old data at about 700K records per day. This has a great effect on our insert rate. Why the big hit because of deleted records and can anything other than a compress help? This process has no downtime so running a compress can only be done maybe once a month.

I will just note that space reclamation of deleted records does generate
"background" work.  And this background work can affect concurrent
insert peformance.

If your application can handle the complications with using multiple tables, I would suggest you go that way. That is the approach I have
seen most applications like this use across a wide range of databases.
Inserts then are going to be best case performace, and reclaiming the
space can be done "online" without affecting the data you might need
to access.  Dropping an entire table is always going to be less overhead
on the system then deleting the same number of rows from a table, as in
the later case each row will be logged, and an number of cpu actions are
taken for manipulating the bytes on the pages for each delete.  In the
case of a drop we simply log the drop table and delete the file from
the file system.

You could use table functions or views to make it easier to query the "current" set of data.
Thanks for an feedback.

Brett






Reply via email to