[ 
https://issues.apache.org/jira/browse/TAJO-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi resolved TAJO-925.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0

committed.

> Child ExecutionBlock of JOIN node has different number of shuffle keys.
> -----------------------------------------------------------------------
>
>                 Key: TAJO-925
>                 URL: https://issues.apache.org/jira/browse/TAJO-925
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyoungjun Kim
>            Assignee: Hyoungjun Kim
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> If both sides of a join node is not SCAN but SUBQUERY, each node has 
> different number shuffle keys.
> In that case JOIN query returns a wrong result.  I tested with the below test 
> code.
> {code}
> @Test
> public void testJoinWithDifferentShuffleKey() throws Exception {
>   KeyValueSet tableOptions = new KeyValueSet();
>   tableOptions.put(StorageConstants.CSVFILE_DELIMITER, 
> StorageConstants.DEFAULT_FIELD_DELIMITER);
>   tableOptions.put(StorageConstants.CSVFILE_NULL, "\\\\N");
>   Schema schema = new Schema();
>   schema.addColumn("id", Type.INT4);
>   schema.addColumn("name", Type.TEXT);
>   List<String> data = new ArrayList<String>();
>   int bytes = 0;
>   for (int i = 0; i < 1000000; i++) {
>     String row = i + "|" + i + 
> "name012345678901234567890123456789012345678901234567890";
>     bytes += row.getBytes().length;
>     data.add(row);
>     if (bytes > 2 * 1024 * 1024) {
>       break;
>     }
>   }
>   TajoTestingCluster.createTable("large_table", schema, tableOptions, 
> data.toArray(new String[]{}));
>   int originConfValue = 
> conf.getIntVar(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME);
>   
> testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname,
>  "1");
>   ResultSet res = executeString(
>      "select count(b.id) " +
>          "from (select id, count(*) as cnt from large_table group by id) a " +
>          "left outer join (select id, count(*) as cnt from large_table where 
> id < 200 group by id) b " +
>          "on a.id = b.id"
>   );
>   try {
>     String expected =
>         "?count\n" +
>             "-------------------------------\n" +
>             "200\n";
>     assertEquals(expected, resultSetToString(res));
>   } finally {
>     
> testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname,
>  "" + originConfValue);
>     cleanupQuery(res);
>     executeString("DROP TABLE large_table PURGE").close();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to