camelia_c created TAJO-175:
------------------------------
Summary: MergeJoinExec incorrect processing
Key: TAJO-175
URL: https://issues.apache.org/jira/browse/TAJO-175
Project: Tajo
Issue Type: Bug
Components: physical operator
Environment: DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.4 LTS"
Hadoop 0.20.2-cdh3u3
Reporter: camelia_c
For query
select dep1.dep_id, emp1.dep_id, emp1.salary from dep1 join emp1 on
dep1.dep_id=emp1.dep_id;
And data:
---------------dep1
10,Purchasing,1
20,Shipping,1
30,Manufacturing,3
40,QA,6
50,Accounting,
create external table dep1 (dep_id int, dep_name text, loc_id int) using csv
with ('csvfile.delimiter'=',') location 'file:/home/camelia/testdata/DEP1';
----------------- emp1
1000,Tom,Smith,10,333,100
1001,Mary,Thompson,10,555,
1002,Aron,Weber,,777,100
1003,Susan,Carlson,,999,
create external table emp1 (emp_id int, first_name text, last_name text, dep_id
int, salary float, job_id int) using csv with ('csvfile.delimiter'=',')
location 'file:/home/camelia/testdata/EMP1';
-------------------------------------------------
With the original MergeJoinExec, with logging info messages inserted along the
processing steps, it doesn't output any result and it reads wrong values (12
instead of NULL)
13/09/09 20:46:01 INFO physical.MergeJoinExec: ********rightChild.next()
=(0=>555.0, 1=>10)
13/09/09 20:46:01 INFO physical.MergeJoinExec: ********rightChild.next()
=(0=>777.0, 1=>12)
The TAJO output is :
tajo> select dep1.dep_id, emp1.dep_id, emp1.salary from dep1 join emp1 on
dep1.dep_id=emp1.dep_id;
2013-09-09 20:45:52,947 INFO client.TajoClient
(TajoClient.java:connectionToQueryMaster(190)) - Connected to Query Master
(qid=q_1378748585102_0001, addr=127.0.1.1:8091)
Progress: 0%, response time: 1.036 sec
Progress: 0%, response time: 2.04 sec
Progress: 0%, response time: 3.042 sec
Progress: 0%, response time: 4.045 sec
Progress: 0%, response time: 5.047 sec
Progress: 0%, response time: 6.049 sec
Progress: 0%, response time: 7.05 sec
Progress: 0%, response time: 8.052 sec
Progress: 100%, response time: 8.32 sec
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/camelia/tajo_git/incubator-tajo/tajo-dist/target/tajo-0.2.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/local/hadoop-2.0.3-alpha/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2013-09-09 20:46:02,513 WARN util.NativeCodeLoader
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
2013-09-09 20:46:02,782 INFO rpc.NettyClientBase
(NettyClientBase.java:close(87)) - Proxy is disconnected from 127.0.1.1:8091
2013-09-09 20:46:02,784 INFO client.TajoClient
(TajoClient.java:closeQuery(113)) - Closed a QueryMaster connection
(qid=q_1378748585102_0001, addr=mmm2/127.0.1.1:8091)
final state: QUERY_SUCCEEDED, init time: 1.61 sec, execution time: 0.0 sec,
total response time: 8.32 sec
result: file:/home/camelia/tajo/q_1378748585102_0001
dep_id, dep_id, salary
-------------------------------
tajo>
I shall attach archive with logs data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira