Hi Camelia, Could you let me know as follows? If so, it's easier to investigate the problem.
* your submitted SQL query * which physical operator (NLJoin or MergeJoin?) * (if possible) data sample that reproduces the problem Best regards, Hyunsik On Mon, Sep 9, 2013 at 7:30 AM, camelia c <[email protected]> wrote: > A small addition to the previous message: > > The value obtained with > > innerTuple = rightChild.next(); > > > is in the join operator. > > > Camelia > > > ----- Forwarded Message ----- > From: camelia c <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Monday, September 9, 2013 1:25 AM > Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec > > > > Hello, > > Thank You very much for You helpful answer of yesterday! > > While testing, I encountered the following issue: the null values which are > read from files are sometimes randomly replaced by numbers such as 24 or 29 > or 30. This makes a serious problem for the algorithms! Can You please tell > me why do do think this happens and how can it be corrected? > > > Let me give You an example > > create external table emp1 (emp_id int, first_name text, last_name text, > dep_id int, salary float, job_id int) using csv with > ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1'; > > > > I specify null values in file like this: > > 1000,Tom,Smith,10,333,100 > 1001,Mary,Thompson,10,555, > 1002,Aron,Weber,,777,100 > 1003,Susan,Carlson,,999, > > Both the internal nulls and the trailing nulls(those at the end of line) are > sometimes randomly substituted with a small number; for example (last_name, > salary, emp_id, dep_id) was read from file with > > innerTuple = rightChild.next(); > > obtaining values innerTuple.toString() as : > > > (0=>Weber, 1=>777.0, 2=>1002, 3=>29) > > > Sometimes, in other queries the null value is correctly read as NULL. > > > > Thank You in advance! > > Yours sincerely, > Camelia > > > > > ________________________________ > From: Hyunsik Choi <[email protected]> > To: tajo-dev <[email protected]>; camelia c > <[email protected]> > Sent: Saturday, September 7, 2013 6:00 PM > Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec > > > Hi camelia, > > I'm sorry for late response. I've just came back home from the family > meeting. I leave in-line comments on your question. > > Best regards, > Hyunsik > > > On Sep 7, 2013, at 8:42 PM, camelia c <[email protected]> wrote: > >> Hello, >> >> I resend You an updated list of questions that I have. For some of the >> ancient ones, I found the answer already. >> >> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and >> outerTupleSlots and can You please give me an example of how they are >> filled, based on a dummy data set ? > > Merge join forwards each relation in order > to find the same join key > tuples. Each of them keeps a list of tuples whose join keys are same. > Consider the below examples where there are two relations to be joined > and the first column of each relation is the join key. > > ----------------------------------- > Two relations to be joined > ----------------------------------- > Left Right > (1, A) (1, B) > (1, C) (1, C) > (3, D) (1, D) > (2, E) > > > MergeJoin first finds all the same key tuples for each relation. So, > each tuple slot contains as follows: > > outerTupleSlots : (1, A), (1,C) > innerTupleSlots : (1,B), (1, C), (1,D) > > Then, MergeJoin leads to joined tuples. In the above example, > MergeJoin > results in 6 tuples (2 x 3). > >> >> 2) I understood from a talk that the MergeJoinExec has some issues and that >> Mr Jihoon is trying to fix them. Can I rely on the current version of >> MergeJoinExec to extend it for FullOuter_MergeJoinExec and >> RightOuter_MergeJoinExec? > > MergeJoinExec does not have any problem. It is correct. There was a > misunderstood. > >> >> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain >> the block name containing it? >> Even for a single-block query, how do we find for a JoinNode that it belongs >> to @ROOT, for example? >> >> More precisely, in class OuterJoinRewriteRule, in method >> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, >> Stack<LogicalNode> stack, Integer depth) >> >> I tried to do >> plan.getBlock(joinNode).getName() >> but I receive a Null Pointer Exception. >> > > The > current API cannot what you want. The API needs to be improved for > supporting that. Probably, that is archived by modifying > BasicLogicalNodeVisitor's visitChild method to call visitXXXNode > method with some object including a current block name. I'll create a > jira issue for this improvement. > > >> >> >> I look forward to receiving Your answer! >> >> Yours sincerely, >> Camelia
