On 8/16/13 8:34 AM, Vathsala Weerasinghe wrote:
Hi,

I'm Vathsala Weerasinghe, a final year undergraduate of the Department
of Computer Science and Engineering,  University of Moratuwa, Sri
Lanka.

Currently we are looking into the derby source to improve the hash
join performance as a group project.

We are trying to identify the implementation of the hash join in
Derby. Now we are looking into HashJoinStrategy class in the
org.apache.derby.impl.sql.compile package.

Can someone guide us towards the correct path or provide us some good
resources which will help us in tackling this problem?

Thanks in advance.
Hi Vathsala,

JoinStrategy is an interface which represents the two approaches which the optimizer may pick for joining tables. There are 2 implementations of JoinStrategy: NestedLoopJoinStrategy and HashJoinStrategy. The JoinStrategies are purely compile-time structures and they disappear at execution time.

You may be interested in changing the Derby cost model so that the optimizer picks a HashJoinStrategy more or less often. However, I suspect that you are really interested in improving the execution-time performance of hash joins. If that is the case, then you will want to look at the following class:

HashScanResultSet - This is the right child of the join when performing a hash join. At initialization time, this node reads the right table of the join and builds a hash map, mapping keys to full rows. A join node sits above this HashScanResultSet. The join node reads rows from its driving, left child. For each left row, the join node asks the HashScanResultSet to match that row's key to all matching rows in the right table. The HashScanResultSet uses the key to probe for matches in the hash map which it built at initialization time.

Hope this helps,
-Rick


Reply via email to