On 23.10.08 02:26, Jim Newsham wrote:
Hasn't been a lot of response to this thread. I have a 23gb database
holding only 500mb of data, all created with just inserts (no deletes). For
our app, this is a serious problem.
Hi Jim,
I would say this is a serious problem for any app...
Do you have a way to reproduce this behavior?
If so, it would be very helpful if you could attach the repro to a
Jira[1] issue.
Someone suggested the problem is caused by multi-threaded inserts, but the
tables which exhibit the problem were only inserted into by a single thread,
each.
Any suggestions?
Is there a way to tell, before compacting, how much space would be saved by
compacting a table? With this information, at least I would be able to
periodically compact just those tables which merit being compacted, as a
workaround to the real problem.
I don't remember the details, but if you have a look at the source for
"DerbyDiskSpaceDiag.java" attached to DERBY-2549[2] you should be able
to figure out the SQL required. I'm sure it's in the manual somewhere too.
You can also just compile and run the program (preferably on a copy /
non-production db) and see what it says. With 23 GB of data, I think it
will chew for a while before you see any output.
--
Kristian
[1] https://issues.apache.org/jira/browse/DERBY
[2] https://issues.apache.org/jira/browse/DERBY-2549
Thanks,
Jim
-----Original Message-----
From: Jim Newsham [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 21, 2008 11:21 AM
To: 'Derby Discussion'
Subject: RE: excessive disk space allocation
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Monday, October 20, 2008 9:27 PM
To: Derby Discussion
Subject: Re: excessive disk space allocation
Jim Newsham <[EMAIL PROTECTED]> writes:
Hi,
I'm doing some benchmarking of our application which stores data in
derby.
The parts of the application which I am exercising only perform
inserts,
not
deletes. The results suggest that derby disk space allocation is
excessive,
particularly because compressing the tables reduces the size of the
database *
substantially*. For example, here are the results of several
databases,
both
before and after compression.
Application running time. original -> compressed
0.5 days. 178.2mb -> 63.1mb
1 day. 559.3mb -> 82.8mb
2 days. 1,879.1mb -> 120.8mb
4 days. 5,154.4mb -> 190.5mb
8 days. 11,443.7mb -> 291.6mb
16 days. 23,706.7mb -> 519.3mb
Plotting the data, I observe that both uncompressed and compressed
sizes
appear to grow linearly, but the growth factor (slope of the linear
equation)
is 53 times as large for the uncompressed database. Needless to say.
this is
huge.
I expected that with only inserts and no deletes, there should be
little
or no
wasted space (and no need for table compression). Is this assumption
incorrect?
Hi Jim,
You may have come across a known issue with multi-threaded inserts to
the same table:
http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
https://issues.apache.org/jira/browse/DERBY-2337
https://issues.apache.org/jira/browse/DERBY-2338
Thanks for those links. I used the diagnostic dump program from the
mentioned discussion thread to see how much the individual tables in my
database are compacting.
The "multi-threaded inserts to the same table" theory doesn't quite jive
here. In my case, I have multiple threads inserting into the database,
but
most of the data goes into tables which are only inserted into by a single
thread for the duration of the application.
There are only two tables inserted into by more than one thread, and the
data they contain is relatively small (a few percent). For a test
database
I'm looking at right now, these two tables compress to 50% and 90% of
original size, respectively... not much at all.
By contrast, I am seeing most of the other tables (which aren't inserted
into by more than one thread) compress to between 0.5% and 3.8% of
original
size. For example, I see one table go from 783 pages to 4 pages.
Jim