Re: Derby hash join performance improvement

Vathsala Weerasinghe Fri, 16 Aug 2013 19:26:12 -0700

Hi Rick,

Thank you very much for the quick response. Yes, we are interested in
improving the execution-time performance of hash joins. Will look into
the class you suggested. This will really help us to get started.


Thank you very much again for the valuable information.

Vathsala.

On 16 August 2013 23:20, Rick Hillegas <[email protected]> wrote:
> On 8/16/13 8:34 AM, Vathsala Weerasinghe wrote:
>>
>> Hi,
>>
>> I'm Vathsala Weerasinghe, a final year undergraduate of the Department
>> of Computer Science and Engineering,  University of Moratuwa, Sri
>> Lanka.
>>
>> Currently we are looking into the derby source to improve the hash
>> join performance as a group project.
>>
>> We are trying to identify the implementation of the hash join in
>> Derby. Now we are looking into HashJoinStrategy class in the
>> org.apache.derby.impl.sql.compile package.
>>
>> Can someone guide us towards the correct path or provide us some good
>> resources which will help us in tackling this problem?
>>
>> Thanks in advance.
>
> Hi Vathsala,
>
> JoinStrategy is an interface which represents the two approaches which the
> optimizer may pick for joining tables. There are 2 implementations of
> JoinStrategy: NestedLoopJoinStrategy and HashJoinStrategy. The
> JoinStrategies are purely compile-time structures and they disappear at
> execution time.
>
> You may be interested in changing the Derby cost model so that the optimizer
> picks a HashJoinStrategy more or less often. However, I suspect that you are
> really interested in improving the execution-time performance of hash joins.
> If that is the case, then you will want to look at the following class:
>
> HashScanResultSet - This is the right child of the join when performing a
> hash join. At initialization time, this node reads the right table of the
> join and builds a hash map, mapping keys to full rows. A join node sits
> above this HashScanResultSet. The join node reads rows from its driving,
> left child. For each left row, the join node asks the HashScanResultSet to
> match that row's key to all matching rows in the right table. The
> HashScanResultSet uses the key to probe for matches in the hash map which it
> built at initialization time.
>
> Hope this helps,
> -Rick
>
>



-- 
Regards,
Vathsala Weerasinghe
tech-surge.blogspot.com

Re: Derby hash join performance improvement

Reply via email to