A small addition to the previous message:

The value obtained with 

   innerTuple = rightChild.next();  


is in the join operator.


Camelia


----- Forwarded Message -----
From: camelia c <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Monday, September 9, 2013 1:25 AM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 


Hello,

Thank You very much for You helpful answer of yesterday!

While testing, I encountered the following issue: the null values which are 
read from files are sometimes randomly replaced by numbers such as 24 or 29 or 
30. This makes a serious problem for the algorithms! Can You please tell me why 
do do think this happens and how can it be corrected?


Let me give You an example

create external table emp1 (emp_id int, first_name text, last_name text, dep_id 
int, salary float, job_id int) using csv with ('csvfile.delimiter'=',') 
location 'file:/home/camelia/testdata/EMP1';



I specify null values in file like this:

1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,

Both the internal nulls and the trailing nulls(those at the end of line) are 
sometimes  randomly substituted with a small number; for example (last_name, 
salary, emp_id, dep_id) was read from file with 

innerTuple = rightChild.next();

obtaining values innerTuple.toString() as :


(0=>Weber, 1=>777.0, 2=>1002, 3=>29)


Sometimes, in other queries the null value is correctly read as NULL.



Thank You in advance!

Yours sincerely,
Camelia




________________________________
 From: Hyunsik Choi <[email protected]>
To: tajo-dev <[email protected]>; camelia c 
<[email protected]> 
Sent: Saturday, September 7, 2013 6:00 PM
Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec
 

Hi camelia,

I'm sorry for late response. I've just came back home from the family
meeting. I leave in-line comments on your question.

Best regards,
Hyunsik


On Sep 7, 2013, at 8:42 PM, camelia c <[email protected]> wrote:

> Hello,
>
> I resend You an updated list of questions that I have. For some of the 
> ancient ones, I found the answer already.
>
> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and 
> outerTupleSlots and can You please give me an example of how they are filled, 
> based on a dummy data set ?

Merge join forwards each relation in order
 to find the same join key
tuples. Each of them keeps a list of tuples whose join keys are same.
Consider the below examples where there are two relations to be joined
and the first column of each relation is the join key.

-----------------------------------
Two relations to be joined
-----------------------------------
Left                Right
(1,  A)            (1, B)
(1, C)             (1, C)
(3, D)             (1, D)
                      (2, E)


MergeJoin first finds all the same key tuples for each relation. So,
each tuple slot contains as follows:

outerTupleSlots : (1, A), (1,C)
innerTupleSlots : (1,B), (1, C), (1,D)

Then, MergeJoin leads to joined tuples. In the above example,
MergeJoin
 results in 6 tuples (2 x 3).

>
> 2) I understood from a talk that the MergeJoinExec has some issues and that 
> Mr Jihoon is trying to fix them. Can I rely on the current version of 
> MergeJoinExec to extend it for FullOuter_MergeJoinExec and 
> RightOuter_MergeJoinExec?

MergeJoinExec does not have any problem. It is correct. There was a
misunderstood.

>
> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain the 
> block name containing it?
> Even for a single-block query, how do we find for a JoinNode that it belongs 
> to @ROOT, for example?
>
> More precisely, in class OuterJoinRewriteRule, in method
>    public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, 
>Stack<LogicalNode> stack, Integer depth)
>
> I tried to do
>     plan.getBlock(joinNode).getName()
> but I receive a Null Pointer Exception.
>

The
 current API cannot what you want. The API needs to be improved for
supporting that. Probably, that is archived by modifying
BasicLogicalNodeVisitor's visitChild method to call visitXXXNode
method with some object including a current block name. I'll create a
jira issue for this improvement.


>
>
> I look forward to receiving Your answer!
>
> Yours sincerely,
> Camelia

Reply via email to