Hi Kristian,
The entire code is available at http://code.google.com/p/jamun/.
It is written in scala, which might not be familiar to all. I am posting below
some snippets that might help.
Kristian Waagan wrote:
> I think you'll have a better chance of getting any answers if you
> provide more information, for instance;
> o data volume; number of rows in table(s)
Number of rows in the concerned table = ~ 50k
> o the SQL query, and possibly the Java code executing it (a little
> later, the query plan might come in handy too)
This is the prepared statement:
conn.prepareStatement (
"select name,version,release,time from PKG where PKG.id in (select id from
PROVIDES where name = ?)",
ResultSet.FETCH_FORWARD,ResultSet.TYPE_FORWARD_ONLY )
This is the collector function (it is in scala, but I hope it's logic would be
apparent):
def extractPkgResult(repo:Repo, resultSet:ResultSet) = {
if (resultSet.next) {
var pkgList:List[MngdPackage] = Nil
var keepGoing = true
while (keepGoing) {
pkgList ::= MngdPackage(resultSet.getString(1),
resultSet.getString(2), resultSet.getString(3), "s umm", "", Nil, Nil,
Some(repo), resultSet.getLong(4))
keepGoing = resultSet.next
}
pkgList
} else Nil
}
> o database schema (including indexes)
create table PKG (
id integer primary key,
name character varying (60),
version character varying (40),
release character varying (40),
time bigint)
create table PROVIDES (
id integer, foreign key (id) references PKG(id),
name character varying (256),
version character varying (40))
create index PROVIDES_NAME_IDX on PROVIDES (name)
>
> Since you're searching for a string inside a VARCHAR, Derby has to check
> all rows in the table. Are you using LIKE in the query?
No, it's a "where x = y"
> Is there a way that you can reduce the number of rows that qualify?
No. It is already down to 1 (99% of the time)
> Are you seeing the long wait on the first rs.next() ?
Yes! I tried more detailed profiling now and this is indeed true.
> What about the subsequent rs.next() calls (if more than one row in the
> result)?
In case of empty result, first rs.next consumes big time.
In case of non-empty result, the rs.next at the end of the result (when it
returns false) consumes little time; less than 1 ms. All other rs.next calls
consume big time (150 to 200 ms)
>
>
> There are also some tips here:
> http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
I am attaching the query plan.
thanks,
Harshad
2008-12-15 18:39:13.375 GMT Thread[main,5,main] (XID = 4663), (SESSIONID = 0),
select name,version,release,time from PKG where PKG.id in (select id from
PROVIDES where name = ?) ******* Project-Restrict ResultSet (6):
Number of opens = 1
Rows seen = 1
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 576.85
optimizer estimated cost: 40545.65
Source result set:
Nested Loop Exists Join ResultSet:
Number of opens = 1
Rows seen from the left = 5586
Rows seen from the right = 1
Rows filtered = 0
Rows returned = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 576.85
optimizer estimated cost: 40545.65
Left result set:
Table Scan ResultSet for PKG at read committed isolation level
using instantaneous share row locking chosen by the optimizer (Actual locking
used: table level locking.)
Number of opens = 1
Rows seen = 5586
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched=All
Number of columns fetched=5
Number of pages visited=105
Number of rows qualified=5586
Number of rows visited=5586
Scan type=heap
start position:
null stop position:
null qualifiers:
None
optimizer estimated row count: 5589.00
optimizer estimated cost: 2536.99
Right result set:
Project-Restrict ResultSet (5):
Number of opens = 5586
Rows seen = 5586
Rows filtered = 5585
restriction = true
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 576.85
optimizer estimated cost: 38008.66
Source result set:
Index Row to Base Row ResultSet for PROVIDES:
Number of opens = 5586
Rows seen = 5586
Columns accessed from heap = {0}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 576.85
optimizer estimated cost: 38008.66
Index Scan ResultSet for PROVIDES using index
PROVIDES_NAME_IDX at read committed isolation level using share row locking
chosen by the optimizer (Actual locking used: table level locking.)
Number of opens = 5586
Rows seen = 5586
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched=All
Number of columns fetched=2
Number of deleted rows visited=0
Number of pages visited=16758
Number of rows qualified=5586
Number of rows visited=11171
Scan type=btree
Tree height=3
start position:
>= on first 1 column(s).
Ordered null semantics on the following columns:
stop position:
> on first 1 column(s).
Ordered null semantics on the following columns:
qualifiers:
None
optimizer estimated row count:
576.85
optimizer estimated cost:
38008.66