Hi camelia, Yes, your understanding is correct. Tajo uses such an approach for building hash tables.
Best regards, Hyunsik On Fri, Sep 6, 2013 at 7:31 PM, camelia c <[email protected]> wrote: > Hello, > > I have a question regarding the hash function used in HashJoinExec, please. > From the source code, I reached the conclusion that in TAJO the hash function > used in the build phase of the algorithm is the identity function: > h(x) = x > > Am I correct? > > I shall give some examples and please correct me if I misunderstood something > about TAJO's approach. > I shall use the notation ( , , .. ,) for a Tuple and [ , , , ] for a list > of elements > > > > For example > > Example 1) > > Given input set of tuples > {(1,aaa), (1,bbb), (1,ccc), (2,ddd), (5,eee)} > > > and if the join key consists of the first numeric column, then we have in the > build table (tupleSlots): > > keyTuple | Value which is ArrayList of Tuple-s > ---------------------------------------------------- > > (1) | [ (1,aaa), (1,bbb) , (1,ccc) ] > > (2) | [ (2,ddd) ] > > (5) | [ (5,eee) ] > > Example 2) > > Given input set of tuples{(10,A,aaa), (10,A,bbb), (10,A,ccc), (20,B,ddd), > (50,C,eee)} > > and if the join key consists of the first two columns (a numeric and a > string), then we have in the build table (tupleSlots): > > keyTuple | Value which is ArrayList of Tuple-s > -------------------------------------------------------- > > (10, A) | [ (10, A, aaa), (10, A, bbb), (10, A, ccc) ] > (20, B) | [ (20, B, ddd) ] > (50, C) | [ (50, C, eee) ] > > > > Thank You all in advance. > > Yours sincerely, > Camelia
