Thank you for more detailed information. Are these problems caused by your working source?
If so, how can I access your recent working source? Your github? Actually, the recommended way for sharing your problem is as follows: * create an Jira issue * submit your patch or your github revision url * describe your problem (your attached file is already satisfied) Best regards, Hyunsik Choi On Mon, Sep 9, 2013 at 10:04 PM, camelia c <[email protected]> wrote: > Hello, > > I send You an archive with the 3 problems encountered so far with the > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/RightOuter_MergeJoinExec.java > > Please be kind to help me solve them. > > For each problem there is a separate folder in the archive, containing the > query, the problem, the TAJO output, the logical plan of MasterLOG and the > worker's log. > > To summarize: > Problem 1) partial output and > > java.lang.NullPointerException > at org.apache.tajo.cli.TajoCli.getQueryResult(TajoCli.java:383) > at org.apache.tajo.cli.TajoCli.executeStatements(TajoCli.java:294) > at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:223) > at org.apache.tajo.cli.TajoCli.main(TajoCli.java:643) > > , even if the physical operator's next method returns correct and complete > results. > > Problem 2) incorrect values in tuples received from child nodes > > Problem 3) unexpected stop receiving values and > ERROR querymaster.QueryUnitAttempt: FROM mmm2 >> Java heap space > > The dataset is also concatenated in a separate data file in the archive. > > > Thank You very much! > Camelia > > > ________________________________ > From: Hyunsik Choi <[email protected]> > To: tajo-dev <[email protected]>; camelia c > <[email protected]> > Sent: Monday, September 9, 2013 3:52 AM > > Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec > > Hi Camelia, > > Could you let me know as follows? If so, it's easier to investigate the > problem. > > * your submitted SQL query > * which physical operator (NLJoin or MergeJoin?) > * (if possible) data sample that reproduces the problem > > Best regards, > Hyunsik > > > On Mon, Sep 9, 2013 at 7:30 AM, camelia c <[email protected]> wrote: >> A small addition to the previous message: >> >> The value obtained with >> >> innerTuple = rightChild.next(); >> >> >> is in the join operator. >> >> >> Camelia >> >> >> ----- Forwarded Message ----- >> From: camelia c <[email protected]> >> To: "[email protected]" <[email protected]> >> Sent: Monday, September 9, 2013 1:25 AM >> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec >> >> >> >> Hello, >> >> Thank You very much for You helpful answer of yesterday! >> >> While testing, I encountered the following issue: the null values which >> are read from files are sometimes randomly replaced by numbers such as 24 or >> 29 or 30. This makes a serious problem for the algorithms! Can You please >> tell me why do do think this happens and how can it be corrected? >> >> >> Let me give You an example >> >> create external table emp1 (emp_id int, first_name text, last_name text, >> dep_id int, salary float, job_id int) using csv with >> ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/EMP1'; >> >> >> >> I specify null values in file like this: >> >> 1000,Tom,Smith,10,333,100 >> 1001,Mary,Thompson,10,555, >> 1002,Aron,Weber,,777,100 >> 1003,Susan,Carlson,,999, >> >> Both the internal nulls and the trailing nulls(those at the end of line) >> are sometimes randomly substituted with a small number; for example >> (last_name, salary, emp_id, dep_id) was read from file with >> >> innerTuple = rightChild.next(); >> >> obtaining values innerTuple.toString() as : >> >> >> (0=>Weber, 1=>777.0, 2=>1002, 3=>29) >> >> >> Sometimes, in other queries the null value is correctly read as NULL. >> >> >> >> Thank You in advance! >> >> Yours sincerely, >> Camelia >> >> >> >> >> ________________________________ >> From: Hyunsik Choi <[email protected]> >> To: tajo-dev <[email protected]>; camelia c >> <[email protected]> >> Sent: Saturday, September 7, 2013 6:00 PM >> Subject: Re: [GSoc2013] - Outer Join - a question about MergeJoinExec >> >> >> Hi camelia, >> >> I'm sorry for late response. I've just came back home from the family >> meeting. I leave in-line comments on your question. >> >> Best regards, >> Hyunsik >> >> >> On Sep 7, 2013, at 8:42 PM, camelia c <[email protected]> wrote: >> >>> Hello, >>> >>> I resend You an updated list of questions that I have. For some of the >>> ancient ones, I found the answer already. >>> >>> 1) In MergeJoinExec, what is the purpose of the innerTupleSlots and >>> outerTupleSlots and can You please give me an example of how they are >>> filled, based on a dummy data set ? >> >> Merge join forwards each relation in order >> to find the same join key >> tuples. Each of them keeps a list of tuples whose join keys are same. >> Consider the below examples where there are two relations to be joined >> and the first column of each relation is the join key. >> >> ----------------------------------- >> Two relations to be joined >> ----------------------------------- >> Left Right >> (1, A) (1, B) >> (1, C) (1, C) >> (3, D) (1, D) >> (2, E) >> >> >> MergeJoin first finds all the same key tuples for each relation. So, >> each tuple slot contains as follows: >> >> outerTupleSlots : (1, A), (1,C) >> innerTupleSlots : (1,B), (1, C), (1,D) >> >> Then, MergeJoin leads to joined tuples. In the above example, >> MergeJoin >> results in 6 tuples (2 x 3). >> >>> >>> 2) I understood from a talk that the MergeJoinExec has some issues and >>> that Mr Jihoon is trying to fix them. Can I rely on the current version of >>> MergeJoinExec to extend it for FullOuter_MergeJoinExec and >>> RightOuter_MergeJoinExec? >> >> MergeJoinExec does not have any problem. It is correct. There was a >> misunderstood. >> >>> >>> 3) Given a JoinNode anywhere in the logical query plan, how can we obtain >>> the block name containing it? >>> Even for a single-block query, how do we find for a JoinNode that it >>> belongs to @ROOT, for example? >>> >>> More precisely, in class OuterJoinRewriteRule, in method >>> public LogicalNode visitJoin(LogicalPlan plan, JoinNode joinNode, >>> Stack<LogicalNode> stack, Integer depth) >>> >>> I tried to do >>> plan.getBlock(joinNode).getName() >>> but I receive a Null Pointer Exception. >>> >> >> The >> current API cannot what you want. The API needs to be improved for >> supporting that. Probably, that is archived by modifying >> BasicLogicalNodeVisitor's visitChild method to call visitXXXNode >> method with some object including a current block name. I'll create a >> jira issue for this improvement. >> >> >>> >>> >>> I look forward to receiving Your answer! >>> >>> Yours sincerely, >>> Camelia >
