Excellent. Looks like we are on the same page now.
Re: HashSet (HashMap?) vs. []. The map is definitely performing better
when you are looking up a value by its key. This may be the case when
we are assembling column descriptors inside the builder. This
operation is done 1 time (ok maybe it is done N times, where N is the
number of columns).
However processing the ResultSet is a different story. There's no
lookup by key. For each ResultSet row we need to apply ALL column
descriptors one by one to get the values out. So with a HashSet/
HashMap we'd have this:
Iterator<ColumnDescriptor> it = map.values().iterator();
while(it.hasNext()) {
ColumnDescriptor column = it.next();
....
}
With a ColumnDescriptor[] we have this:
for(int i = 0; i < length; i++) {
column = columns[i];
}
Both loops are done M times, where M is the number of rows in the
ResultSet. In the worst case scenario, M is much larger than N. In the
first case, we call three extra methods (iterator, hasNext, next) and
create at least one extra object (Iterator). So the secon case is
marginally faster. Now if you multiply that nanosecond or whatever
difference by a few millions, it can become more significant.
So essentially when talking about this refactoring we need to separate
the first step of preparing the columns, and the second step of using
them.
Andrus
On Oct 12, 2009, at 12:38 PM, Evgeny Ryabitskiy wrote:
It doesn't matter how this represented *inside* the builder class, as
builder is used only once per query. On the other hand, coming out
of the
builder it must be optimized, as access to the column descriptors
array is
performed N*M times during each result set processing, where N is
the width
of the result set, and M is its length. I.e. it can be a very large
number
(up to tens or hundreds of millions calls). Every small
optimization matters
here.
So.. I was talking exactly about optimization... HashedSet array can
be faster cause we perform several scans over whole array of
ColumnDescriptors. And safety cause we don't get duplicates for
Columns. And.. I didn't get you position about this idea
This is something I don't know. We need to check about a dozen of
drivers
from different vendors that we support to verify that. This is just
a getter
in the interface. Implementors could've made it anything.
I have looked through JTDS drivers (not a dozen but a least one).
ResultSet has all information about columns (just private final
ColInfo[] columns).
When getMetaData performed - constructs new Object that has reference
to array of columns from ResultSet .
Looks like there is no problem with JTDS.
The problem that if we don't set ResultSetMetadata like in current
(trunk) version, without ResultSetMetadata we don't know all
columns..
Not true. We don't know all the columns for SQLTemplate queries.
For all
other types of queries we DO know all the columns, as Cayenne
generates SQL
from scratch for those queries. I think this one place is where we
have the
biggest mismatch in our views of the implementation.
ah... now I see. You are right that was a mismatch in our views. I
will work on it in the evening.
Another thing to check here is actually reading column data from
returned ResultSetMetadata, as lazy
resolving of it can be postponed a step further.
Again in JTDS it's just a array of ColInfo (like our
ColumnDescriptor), it's passed to RowSet through constructor from
protocol implementation.
Evgeny Ryabitskiy.