[jira] [Commented] (DRILL-4684) Incorrect behavior when joining RDBMS table on itself

Zelaine Fong (JIRA) Wed, 18 May 2016 08:50:47 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289181#comment-15289181
 ]


Zelaine Fong commented on DRILL-4684:
-------------------------------------

Not sure if this is related to DRILL-4374.  With that Jira, an error is 
returned as opposed to wrong result.  Also, latest comment on the Jira 
indicates the problem wasn't reproducible.

> Incorrect behavior when joining RDBMS table on itself
> -----------------------------------------------------
>
>                 Key: DRILL-4684
>                 URL: https://issues.apache.org/jira/browse/DRILL-4684
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Egbert van der Veen
>         Attachments: actual result.png, expected result.png, result with file 
> query.png
>
>
> When joining a RDBMS table on itself (for instance; with a table that 
> contains both subitems and main items), Drill fails to correctly retrieve the 
> result columns; the columns belonging to the first time that the table is 
> referenced, are always NULL. The values that should be in these columns, 
> instead end up in the columns belonging to the second time the table is 
> referenced.
> This has been tested using both MySQL and PostgreSQL databases as storage 
> providers.
> See attached screenshots for expected result (which works when queried 
> directly against the database), and the actual result through Drill
> Code to reproduce:
> {code:sql}
> --create table
> CREATE TABLE `join_test` (
>   `id` int(11) NOT NULL,
>   `type` varchar(50) DEFAULT NULL,
>   `mainItem` int(11) DEFAULT NULL,
>   `description` varchar(255) DEFAULT NULL,
>   PRIMARY KEY (`id`),
>   UNIQUE KEY `id` (`id`)
> ) ;
> --insert testdata; one main item and two subitems. For subitems, column 
> 'mainItem' is a reference to the parent main item
> INSERT INTO join_test (id, type, mainItem, description) VALUES (1, 
> 'mainItem', null, 'First main item');
> INSERT INTO join_test (id, type, mainItem, description) VALUES (2, 'subItem', 
> 1, 'First subitem of first main item');
> INSERT INTO join_test (id, type, mainItem, description) VALUES (3, 'subItem', 
> 1, 'Second subitem of first main item');
> --perform query
> SELECT main.type firsttype, main.description firstdescription, sub.type 
> secondtype, sub.description seconddescription 
> FROM join_test sub
> JOIN join_test main ON sub.mainItem = main.id
> {code}
> Update:
> The same query *does* work when querying a file through a file storage 
> provider:
> {code:sql}
> CREATE TABLE dfs.tmp.`join_test.parquet` AS select * from 
> mysql.playground.join_test;
> SELECT main.type firsttype, main.description firstdescription, sub.type 
> secondtype, sub.description seconddescription 
> FROM dfs.tmp.`join_test.parquet` sub
> JOIN dfs.tmp.`join_test.parquet` main ON sub.mainItem = main.id;
> {code}
> (see attachment 'result with file query.png' for the result)
> So the issue appears to be in the way Drill handlers RDBMS data sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4684) Incorrect behavior when joining RDBMS table on itself

Reply via email to