Hello,
I have a question regarding the hash function used in HashJoinExec, please.
From the source code, I reached the conclusion that in TAJO the hash function
used in the build phase of the algorithm is the identity function:
h(x) = x
Am I correct?
I shall give some examples and please correct me if I misunderstood something
about TAJO's approach.
I shall use the notation ( , , .. ,) for a Tuple and [ , , , ] for a list of
elements
For example
Example 1)
Given input set of tuples
{(1,aaa), (1,bbb), (1,ccc), (2,ddd), (5,eee)}
and if the join key consists of the first numeric column, then we have in the
build table (tupleSlots):
keyTuple | Value which is ArrayList of Tuple-s
----------------------------------------------------
(1) | [ (1,aaa), (1,bbb) , (1,ccc) ]
(2) | [ (2,ddd) ]
(5) | [ (5,eee) ]
Example 2)
Given input set of tuples{(10,A,aaa), (10,A,bbb), (10,A,ccc), (20,B,ddd),
(50,C,eee)}
and if the join key consists of the first two columns (a numeric and a
string), then we have in the build table (tupleSlots):
keyTuple | Value which is ArrayList of Tuple-s
--------------------------------------------------------
(10, A) | [ (10, A, aaa), (10, A, bbb), (10, A, ccc) ]
(20, B) | [ (20, B, ddd) ]
(50, C) | [ (50, C, eee) ]
Thank You all in advance.
Yours sincerely,
Camelia