Re: Derby hash join performance improvement

Rick Hillegas Fri, 16 Aug 2013 10:51:56 -0700

On 8/16/13 8:34 AM, Vathsala Weerasinghe wrote:

Hi,


I'm Vathsala Weerasinghe, a final year undergraduate of the Department
of Computer Science and Engineering,  University of Moratuwa, Sri
Lanka.

Currently we are looking into the derby source to improve the hash
join performance as a group project.

We are trying to identify the implementation of the hash join in
Derby. Now we are looking into HashJoinStrategy class in the
org.apache.derby.impl.sql.compile package.

Can someone guide us towards the correct path or provide us some good
resources which will help us in tackling this problem?

Thanks in advance.

Hi Vathsala,

JoinStrategy is an interface which represents the two approaches whichthe optimizer may pick for joining tables. There are 2 implementationsof JoinStrategy: NestedLoopJoinStrategy and HashJoinStrategy. TheJoinStrategies are purely compile-time structures and they disappear atexecution time.

You may be interested in changing the Derby cost model so that theoptimizer picks a HashJoinStrategy more or less often. However, Isuspect that you are really interested in improving the execution-timeperformance of hash joins. If that is the case, then you will want tolook at the following class:

HashScanResultSet - This is the right child of the join when performinga hash join. At initialization time, this node reads the right table ofthe join and builds a hash map, mapping keys to full rows. A join nodesits above this HashScanResultSet. The join node reads rows from itsdriving, left child. For each left row, the join node asks theHashScanResultSet to match that row's key to all matching rows in theright table. The HashScanResultSet uses the key to probe for matches inthe hash map which it built at initialization time.


Hope this helps,
-Rick

Re: Derby hash join performance improvement

Reply via email to