I think you are seeing the effect of two problems with derby space
management (maybe one or both). I've logged DERBY-2337 and DERBY-2338
to start a discussion on these. I am just guessing what is going
on.:
1) hot single point of contention, with limited attention to multi-cpu
optimization.
o Derby keeps track of the single "last" page it inserted to, and
then tends to insert the next row on that page if possible.
2) limited tracking of non-full pages.
o Derby currently only allocates a single bit of information about
how full a page is. Basically it sets the bit if it is "unfilled".
If this bit is turned off then we won't consider the page for
inserts in the future.
Derby at insert time trys to keep rows on one page rather that split
across pages, and tries to fill up pages one at a time rather than
multi-thread inserts across multiple pages. So at insert time into
a base table the store is presented with a stream of bytes of unknown
length, it tries the following:
1) First try inserting into last page inserted (single page nubmer
tracked for #1 above).
2) If that fails then try ONE unfilled page, if that succeeds it becomes
the newest "last page".
3) if all else fails then allocate a brand new page and do the insert
to it, and if it has to overflow just go ahead. This then becomes
the "last page".
One guess may be that in multi-user many threads may be failing at step
1 and then each go on to get a different page in step 2 and 3 and in the
process leave really unfilled pages behind.
In your db it would be interesting to see if the mostly empty pages are
marked "unfilled" or not. Depending on the answer the problem may be
that we aren't marking unfilled correctly, or we could do a better job
of finding unfilled pages.
Kristian Waagan wrote:
Suresh Thalamati wrote:
Kristian Waagan wrote:
Hello,
For a database population program I run, I have observed that the
disk space allocation is larger when populating the database with
multiple concurrent connections.
For a specific configuration, the database ends up at 642 MB with a
single connection, whereas it ends up at 1.3 GB when using multiple
connections. The raw data volume is at about 215 MB, there are 20
tables and between 20 and 30 indexes (didn't take the time to figure
out which indexes are composite, 'show indexes' doesn't give this
information but returns 31 rows).
Is this to be expected?
No. The Difference in size seems to be too high with just inserts. Are
you doing inserts into the same table in parallel threads ?. Is it one
particular tables/index that is becoming too big or all the
tables/indexes ?
Thanks Suresh and Bryan for your replies.
You both ask similar questions, so here's a common reply.
The "big" database is the original one, created with 25 insert threads.
The "small" database is the original one after running compress on it.
I also tried inserting with only one thread.
Database size on disk, allocated space:
25 threads 1282 MB
25 threads after compress 642 MB 1 thread 643 MB
As can be seen, running compress on the original database and inserting
with only one thread gave almost identical results.
The database in question has a total of 103 conglomerates, including
system tables. Out of these, 34 has changed size.
The increase/decrease from the compressed database to the original one
varied between -50% to 196%. If we ignore conglomerates smaller then 10
MB, we have variation from -8%/+2 to 196%, but there is only one
conglomerate where the size decreased.
I have attached the sizes below.
So the answer to your question is that some tables become to big, some
do not. I am not able to see a pattern based on the current data, but
maybe you are?
I have also attached the program I used to output space diag
information. It is far from perfect, but it does get the information out.
thanks,
------------------------------------------------------------------------
import java.io.File;
import java.sql.*;
import java.util.Arrays;
/**
* Extract disk space diagnostics for a Derby database.
* This program *WILL NOT WORK* for other databases!
*/
public class DerbyDiskSpaceDiag {
/** Newline character(s). */
private static final String SEP = System.getProperty("line.separator");
/** Name of the diagnostics table. */
private static final String DIAGTABLE =
"new org.apache.derby.diag.SpaceTable";
/**
* Pads string with spaces on the left side.
*/
private static String padLeft(String text, int length) {
StringBuffer str = new StringBuffer(text);
str.ensureCapacity(length);
while (str.length() < length) {
str.insert(0, ' ');
}
return str.toString();
}
/**
* Pads string with spaces on the right side.
*/
private static String padRight(String text, int length) {
StringBuffer str = new StringBuffer(text);
str.ensureCapacity(length);
while (str.length() < length) {
str.append(' ');
}
return str.toString();
}
/**
* Load Derby driver.
* Will load the client driver if the database name starts with '//',
* the embedded driver otherwise.
*/
private static void loadDriver(String dbName) {
String driver = "org.apache.derby.jdbc.EmbeddedDriver";
if (dbName.startsWith("//")) {
driver = "org.apache.derby.jdbc.ClientDriver";
}
try {
Class.forName(driver);
} catch (ClassNotFoundException cnfe) {
cnfe.printStackTrace();
System.err.println("!! Unable to load driver class for Derby: " +
driver + SEP + "!! Please verify your classpath.");
System.exit(1);
}
}
/**
* Print usage message/help.
*/
private static void usage() {
System.out.println("Print disk usage statistics for a Derby database.");
System.out.println(SEP + "Usage:");
System.out.println("\tDerbyDiskSpaceDiag DBNAME [SCHEMA]");
System.out.println();
}
/** SQL used to extract conglomerate number from conglomerate name. */
private static final String CONGLOM_NUMBER =
"SELECT CONGLOMERATENUMBER FROM SYS.SYSCONGLOMERATES c " +
"WHERE c.CONGLOMERATENAME = ?";
/** SQL used to extract table id from table name. */
private static final String TABLEID =
"SELECT TABLEID FROM SYS.SYSTABLES t " +
"WHERE t.TABLENAME = ?";
private static final String HORIZONTAL_LINE =
"- - - - - - - - - - - - - - - - - - - - - - - " +
"- - - - - - - - - - - - - - - -";
/**
* Name of the database to connect to.
* Note that the JDBC protocol prefix is not supposed to be included.
*/
private final String dbName;
/** Tell if we are connect through the embedded driver. */
private final boolean embedded;
/** SQL to extract space diagnostcs. */
private final String diagQuery;
/**
* Total counts for various space diagnostics.
* Currently: allocated, free, save, filesizes.
*/
private long[] totalCounts = new long[4];
/**
* Get space diagnostics for the specified database.
*/
public DerbyDiskSpaceDiag(String dbName) {
this.dbName = dbName;
this.embedded = !dbName.startsWith("//");
this.diagQuery =
"SELECT spaceinfo.* FROM " +
"SYS.SYSSCHEMAS s, SYS.SYSTABLES t, " +
DIAGTABLE + "(SCHEMANAME,TABLENAME) spaceinfo " +
"WHERE s.SCHEMAID = t.SCHEMAID";
}
/**
* Get space diagnostics for the specified schema in the given database.
*/
public DerbyDiskSpaceDiag(String dbName, String schema) {
this.dbName = dbName;
this.embedded = !dbName.startsWith("//");
this.diagQuery =
"SELECT spaceinfo.* FROM " +
"SYS.SYSSCHEMAS s, SYS.SYSTABLES t, " +
DIAGTABLE + "('" + schema + "',TABLENAME) spaceinfo " +
"WHERE s.SCHEMAID = t.SCHEMAID AND " +
"s.SCHEMANAME = '" + schema + "'";
}
/**
* Print disk space statistics.
*/
public void printInfo()
throws SQLException {
Arrays.fill(this.totalCounts, 0L);
Connection con = DriverManager.getConnection(
"jdbc:derby:" + this.dbName);
PreparedStatement conglomNrStmt = con.prepareStatement(CONGLOM_NUMBER);
PreparedStatement tableIdStmt = con.prepareStatement(TABLEID);
Statement diagStmt = con.createStatement();
ResultSet diagRs = diagStmt.executeQuery(diagQuery);
ResultSet fileRs = null;
System.out.println(
padRight("Tablename", 25) + " " +
padRight("Type", 5) + " " +
padLeft("ALLOC", 6) + " " +
padLeft("FREE", 6) + " " +
padLeft("PSIZE", 5) + " " +
padLeft("SAVE", 6) + " " +
padRight("File", 8) + " " +
padLeft("Size KB", 8) + " ");
System.out.println(HORIZONTAL_LINE);
String conglomName = null;
boolean index = false;
Long tmpLong;
// Let's hope the order in the resultset doesn't change.
while (diagRs.next()) {
conglomName = diagRs.getString(1);
index = diagRs.getInt(2) > 0;
if (index) {
System.out.print(" " + padRight(conglomName, 23) + " ");
} else {
System.out.print(padRight(conglomName, 25) + " ");
}
System.out.print((index == true ? "INDEX" : "TABLE") + " ");
tmpLong = diagRs.getLong(3);
this.totalCounts[0] += tmpLong.longValue();
// Allocated
System.out.print(padLeft(tmpLong.toString(), 6) + " ");
tmpLong = diagRs.getLong(4);
this.totalCounts[1] += tmpLong.longValue();
// Free
System.out.print(padLeft(tmpLong.toString(), 6) + " ");
tmpLong = diagRs.getLong(6);
// Page size
System.out.print(padLeft(tmpLong.toString(), 5) + " ");
tmpLong = diagRs.getLong(7);
this.totalCounts[2] += tmpLong.longValue();
// Save
System.out.print(padLeft(tmpLong.toString(), 6) + " ");
if (embedded) {
if (index) {
conglomNrStmt.setString(1, conglomName);
fileRs = conglomNrStmt.executeQuery();
} else {
// Must get table id fidiagRst.
tableIdStmt.setString(1, conglomName);
fileRs = tableIdStmt.executeQuery();
fileRs.next();
String tableId = fileRs.getString(1);
fileRs.close();
conglomNrStmt.setString(1, tableId);
fileRs = conglomNrStmt.executeQuery();
}
if (fileRs.next()) {
printFileInfo(dbName, fileRs.getInt(1));
fileRs.close();
} else {
// Assume system table, which have '_HEAP' as postfix.
conglomNrStmt.setString(1, conglomName + "_HEAP");
fileRs = conglomNrStmt.executeQuery();
if (fileRs.next()) {
printFileInfo(dbName, fileRs.getInt(1));
fileRs.close();
} else {
System.out.print("unable to retrieve");
}
}
} else {
System.out.print("use embedded driver!");
}
// End line
System.out.println();
}
System.out.println(HORIZONTAL_LINE);
System.out.println(
padRight("Total", 25) + " " +
padRight("", 5) + " " +
padLeft(new Long(totalCounts[0]).toString(), 6) + " " +
padLeft(new Long(totalCounts[1]).toString(), 6) + " " +
padLeft("n/a", 5) + " " +
padLeft(new Long(totalCounts[2]).toString(), 6) + " " +
padRight("KB n/a", 8) + " " +
padLeft(new Long(totalCounts[3]).toString(), 8) + " ");
System.out.println(HORIZONTAL_LINE);
con.close();
}
/**
* Print file information.
* This will only work if connection is embedded.
*/
private void printFileInfo(String dbName, int conglomNr)
throws SQLException {
File f = new File(
new File(dbName, "seg0"),
"c" + Integer.toHexString(conglomNr) + ".dat");
System.out.print(padRight(f.getName(), 8) + " ");
System.out.print(padLeft(new Long(f.length() / 1024).toString(), 8));
this.totalCounts[3] += (f.length() / 1024);
}
/**
* Get disk space diagnostics table for a database.
*
* Arguments:
* DBNAME : name/path of the database /required/
* SCHEMA : name of the schema /optional/
*/
public static void main(String[] args)
throws SQLException {
// If no arguments are given, print usage and exit.
if (args.length < 1) {
usage();
System.exit(1);
}
final String dbName = args[0];
loadDriver(dbName);
DerbyDiskSpaceDiag diag = null;
if (args.length > 1) {
// Limit extraction to the specified schema.
diag = new DerbyDiskSpaceDiag(dbName, args[1]);
} else {
diag = new DerbyDiskSpaceDiag(dbName);
}
diag.printInfo();
}
} // End class DerbyDIskSpaceDiag