Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Rick Hillegas Tue, 16 Aug 2005 12:25:03 -0700

Continuing to maunder, let me fine-tune this a bit:

1) Imagine that, on an ongoing basis we maintain some CompressionMetric,which measures whether a given table needs compression/reoptimization.Dead space might be part of this metric or not. Time since lastcompression could be part of the metric. The metric could be as crude orfancy as we like.

2) At some point, based on its CompressionMetric, a table Qualifies forcompression/reoptimization.

3) At some fairly fine-grained interval, a low priority thread wakes up,looks for Qualifying tables, and compresses/reoptimizes them. Bydefault, this thread runs in a 0-administration database, but we exposea knob for scheduling/disabling the thread.

Your original proposal is a degenerate case of this approach and maybeit's the first solution we implement. However, we can get fancier as weneed to support bigger datasets.


Cheers,
-Rick

Rick Hillegas wrote:

Hi Mike,
I like your suggestions that a low priority thread should perform thecompressions and that we should expose a knob for disabling thisthread. Here are some further suggestions:
Compressing all the tables and recalculating all the statistics once amonth could cause quite a hiccup for a large database. Maybe we coulddo something finer grained. For instance, we could try to make it easyto ask some question like "Is more than 20% of this table's spacedead?" No doubt there are some tricky issues in maintaining aper-table dead-space counter and in keeping that counter from being async point during writes. However, if we could answer a question likethat, then we could pay the compression/reoptimization penalty as wego rather than incurring a heavy, monthly lump-sum tax.
Cheers,
-Rick

Mike Matrigali wrote:
Full compression of derby tables is not done automatically, I
am looking for input on how to schedule such an operation.  An
operation like this is going to have a large cpu, i/o, and
possible temporary disk space impact on the rest of the server.
As a zero admin db I think we should figure out some way to
do this automatically, but I think there are a number of
applications which would not be happy with such a performance
impact not under their control.

My initial thoughts are to pick a default time frame, say
once every 30 days to check for table level events like
compression and statistics generation and then execute the operations
at low priority.  Also add some sort of parameter so that
applications could disable the automatic background jobs.

Note that derby does automatically reclaim space from deletes
for subsequent inserts, but the granularity currently is at
a page level.  So deleting every 3rd or 5th row is the worst
case behavior.  The page level decision was a tradeoff as
reclaiming the space is time consuming so did not want to
schedule to work on a row by row basis.  Currently we schedule
the work when all the rows on a page are marked deleted.

Volker Edelmann (JIRA) wrote:
DERBY-132 resolved ? Table not automatically compressed--------------------------------------------------------
        Key: DERBY-510
        URL: http://issues.apache.org/jira/browse/DERBY-510
    Project: Derby
       Type: Bug
   Versions: 10.1.1.0    Environment: JDK 1.4.2, JDK 1.5.0
Windows XP
   Reporter: Volker Edelmann
I tried a test-program that repeatedly inserts a bunch of data into1 table and repeatedly deletes a bunch of data.
   // table is not empty  when test-program starts
    derby.executeSelect("select count(*) c from rclvalues");
TestQueries.executeBulkInsertAnalyst(derby.getConnection(),2000000); // insert 2.000.000 rowsderby.executeDelete("delete from rclvalues where MOD(id, 3) =0");
  TestQueries.executeBulkInsertAnalyst(derby.getConnection(), 1000000);
derby.executeDelete("delete from rclvalues where MOD(id, 5) =0");
    derby.executeSelect("select count(*) c from rclvalues");
At the end of the operation, the table contains approximately thesame number of rows. But the size of the database has grown from581 MB to 1.22 GB. From the description of item DERBY-132, I hopedthat Derby does the compression now ( version 10.1.X.X.).

Re: [jira] Created: (DERBY-510) DERBY-132 resolved ? Table not automatically compressed

Reply via email to