[ 
https://issues.apache.org/jira/browse/TAJO-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinhang Choi updated TAJO-1075:
-------------------------------
    Description: 
Tajo generates unexpected join output with some input made by consecutive query 
execution including 0-size tuple segments. For example,

0.
external table t: 1.1GB data near 24M rows
external table u: 776KB data near 64K rows

1.
tajo query 1> create table t1 as select * from t where t.a=xxx;
tajo query 1a> create table t1_ext as select t1.*, u.b from t1 inner join u on 
t1.key=u.key;
tajo query 2> create table t2 as select * from t where t.a=yyy;
tajo query 2a> create table t2_ext as select t2.*, u.b from t2 inner join u on 
t2.key=u.key;

Table t1_ext and t2_ext are flawless (tajo succeeded in executing queries). The 
last output from a series of query executions has 3MB data near 64K rows in 
hdfs, as following:

Permission      Owner   Group   Size    Replication     Block Size      Name
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000000-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000001-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000002-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000003-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000004-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000005-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000006-000
-rw-r--r--      hadoop  supergroup      1.03 MB 2       32 MB   
part-03-000007-000
-rw-r--r--      hadoop  supergroup      1.92 MB 2       32 MB   
part-03-000008-000

2.
tajo query 3> select * from t1_ext inner join t2_ext on t1_ext.key=t2_ext.key;

the join query result produces abnormal output, for example, having nothing 
regardless of the existence of matching key.

3.
after receiving some advices from [~jihoonson], I manually removed 0-size tuple 
data in hdfs directory, and re-executed tajo query 3. The result indicates that 
TAJO WORKS PROPERLY.

Since not explored tajo source in depth, I can pinpoint nowhere to modify. It 
needs committer's support.

  was:
Tajo generates unexpected join output with some input made by consecutive query 
execution including 0-size tuple segments. For example,

0.
external table t: 1.1GB data near 24M rows

1.
tajo query 1> create table t1 as select * from t where t.a=xxx;
tajo query 1a> create table t1_ext as select t1.*, u.b from t1 inner join u on 
t1.key=u.key;
tajo query 2> create table t2 as select * from t where t.a=yyy;
tajo query 2a> create table t2_ext as select t2.*, u.b from t2 inner join u on 
t2.key=u.key;

Table t1_ext and t2_ext are flawless (tajo succeeded in executing queries). The 
last output from a series of query executions has 3MB data near 64K rows in 
hdfs, as following:

Permission      Owner   Group   Size    Replication     Block Size      Name
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000000-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000001-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000002-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000003-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000004-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000005-000
-rw-r--r--      hadoop  supergroup      0 B     2       32 MB   
part-03-000006-000
-rw-r--r--      hadoop  supergroup      1.03 MB 2       32 MB   
part-03-000007-000
-rw-r--r--      hadoop  supergroup      1.92 MB 2       32 MB   
part-03-000008-000

2.
tajo query 3> select * from t1_ext inner join t2_ext on t1_ext.key=t2_ext.key;

the join query result produces abnormal output, for example, having nothing 
regardless of the existence of matching key.

3.
after receiving some advices from [~jihoonson], I manually removed 0-size tuple 
data in hdfs directory, and re-executed tajo query 3. The result indicates that 
TAJO WORKS PROPERLY.

Since not explored tajo source in depth, I can pinpoint nowhere to modify. It 
needs committer's support.


> unexpected join output from input of consecutive query results including 
> 0-size tuple segment
> ---------------------------------------------------------------------------------------------
>
>                 Key: TAJO-1075
>                 URL: https://issues.apache.org/jira/browse/TAJO-1075
>             Project: Tajo
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>         Environment: 9 HDFS Data Node (Tajo 1 Master + 8 Workers), OpenJDK 1.7
>            Reporter: Jinhang Choi
>
> Tajo generates unexpected join output with some input made by consecutive 
> query execution including 0-size tuple segments. For example,
> 0.
> external table t: 1.1GB data near 24M rows
> external table u: 776KB data near 64K rows
> 1.
> tajo query 1> create table t1 as select * from t where t.a=xxx;
> tajo query 1a> create table t1_ext as select t1.*, u.b from t1 inner join u 
> on t1.key=u.key;
> tajo query 2> create table t2 as select * from t where t.a=yyy;
> tajo query 2a> create table t2_ext as select t2.*, u.b from t2 inner join u 
> on t2.key=u.key;
> Table t1_ext and t2_ext are flawless (tajo succeeded in executing queries). 
> The last output from a series of query executions has 3MB data near 64K rows 
> in hdfs, as following:
> Permission    Owner   Group   Size    Replication     Block Size      Name
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000000-000
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000001-000
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000002-000
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000003-000
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000004-000
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000005-000
> -rw-r--r--    hadoop  supergroup      0 B     2       32 MB   
> part-03-000006-000
> -rw-r--r--    hadoop  supergroup      1.03 MB 2       32 MB   
> part-03-000007-000
> -rw-r--r--    hadoop  supergroup      1.92 MB 2       32 MB   
> part-03-000008-000
> 2.
> tajo query 3> select * from t1_ext inner join t2_ext on t1_ext.key=t2_ext.key;
> the join query result produces abnormal output, for example, having nothing 
> regardless of the existence of matching key.
> 3.
> after receiving some advices from [~jihoonson], I manually removed 0-size 
> tuple data in hdfs directory, and re-executed tajo query 3. The result 
> indicates that TAJO WORKS PROPERLY.
> Since not explored tajo source in depth, I can pinpoint nowhere to modify. It 
> needs committer's support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to