Hi,
I'm currently evaluating H2 for using it mainly as a key - value storage,
which involves storing LOBs in the range in between 1KB and 10MB. The main
features why I am looking at H2 are transactions, transparent storage of
external LOBs in the case they are too large and transparent file
encryption. All of this works quite fine for me. However, I ran into some
'problems', which someone can hopefully clarify for me.
0. My Setup:
H2 Version 1.3.166
System.setProperty("h2.lobInDatabase", "false");
DB setting: SET MAX_LENGTH_INPLACE_LOB=512
connection: jdbc:h2:split:28:fooDB;CIPHER=AES
DROP TABLES IF EXISTS documents;
CREATE TABLE documents(id INT PRIMARY KEY, name VARCHAR, data BLOB);
CREATE INDEX docNameIDX ON documents(name);
Finally, I insert 2 million documents, which results in an id range from 1
to 2,000,000.
1. Searches are usually done against name. Hence, I thought about speeding
up those searches with a temporary memory table, which fails:
CREATE MEMORY TEMP TABLE IF NOT EXISTS memTempDocs AS SELECT * FROM
documents;
Errormessage:
Eingabe/Ausgabe: "java.io.FileNotFoundException:
X:\tmpDB\fooDB.lobs.db\1.lobs.db\1.lobs.db\1.lobs.db\16843009.t24.lob.db
Indeed, this file does not exist on the HDD. However, if I restrict the
above select statement to AS SELECT * FROM documents WHERE id=x
and insert for x different values like 1 or 10,000 or 1,000,000 or whatever
id exists, I get exactly the same error message.
If I omit the BLOBs it works:
CREATE MEMORY TEMP TABLE IF NOT EXISTS memTempDocs AS SELECT id,name FROM
documents;
My question is whether it is possible to create a temporary memory table
that still contains the BLOBs.
2. For the 2 Million documents, query time highly differs depending on the
column queried. Is there an implementation reason for it?
a) about 60ms for SELECT * FROM documents WHERE id=someid;
--> returns at most 1 row
b) about 550ms for SELECT * FROM documents WHERE name=SELECT name FROM
documents where id=someid;
--> possibly returns many elements because of similar names (on average 40
rows). I know the SELECT name statement takes time equal to a). However, in
the index, the returned rows should be near each other in the search tree
because of equality. Right now, I cannot use the column name as primary key
as the names are not unique.
3. If I query the documents table with a prepared statement, is the binary
data directly fetched from HDD or at the time when I access the binary
stream from the returned resultset?
Example, which should use streams instead of a simple byte array (but for
my tests ok):
String query = "SELECT * FROM documents WHERE id=" + index + ";";
ps = conn.prepareStatement(query);
rs = ps.executeQuery();
if(rs.first()) { //at most one element because of unique id!
String name = rs.getString("name");
Blob data = rs.getBlob("data");
if(retrieveData) {
byteData = data.getBytes(0,(int)data.length());
this.fetchedDataSize += byteData.length;
}
}
4. H2 manages the storage of external BLOBs transparently. I wonder whether
I can get the explicit information of the BLOB's name. The use case here is
twofold:
a) get statistical information about the occupied HDD-Space vs.
Memory-Space (e.g. compressed files or encrypted vs. uncompressed files)
b) check whether some hero deleted BLOB-files to save some space on HDD ;-)
--> I guess I can surround the fetch statement with a try/catch block and
mark rows with a missing file as invalid. However, I have the impression
this is a workaround and a misuse of the java.io.FileNotFoundException.
Many thanks in advance,
Sascha
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/h2-database/-/foyKqmGVSlMJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/h2-database?hl=en.