On 8/16/13 8:34 AM, Vathsala Weerasinghe wrote:
Hi,
I'm Vathsala Weerasinghe, a final year undergraduate of the Department
of Computer Science and Engineering, University of Moratuwa, Sri
Lanka.
Currently we are looking into the derby source to improve the hash
join performance as a group project.
We are trying to identify the implementation of the hash join in
Derby. Now we are looking into HashJoinStrategy class in the
org.apache.derby.impl.sql.compile package.
Can someone guide us towards the correct path or provide us some good
resources which will help us in tackling this problem?
Thanks in advance.
Hi Vathsala,
JoinStrategy is an interface which represents the two approaches which
the optimizer may pick for joining tables. There are 2 implementations
of JoinStrategy: NestedLoopJoinStrategy and HashJoinStrategy. The
JoinStrategies are purely compile-time structures and they disappear at
execution time.
You may be interested in changing the Derby cost model so that the
optimizer picks a HashJoinStrategy more or less often. However, I
suspect that you are really interested in improving the execution-time
performance of hash joins. If that is the case, then you will want to
look at the following class:
HashScanResultSet - This is the right child of the join when performing
a hash join. At initialization time, this node reads the right table of
the join and builds a hash map, mapping keys to full rows. A join node
sits above this HashScanResultSet. The join node reads rows from its
driving, left child. For each left row, the join node asks the
HashScanResultSet to match that row's key to all matching rows in the
right table. The HashScanResultSet uses the key to probe for matches in
the hash map which it built at initialization time.
Hope this helps,
-Rick