Robert Hamilton-Smith created DRILL-4211:
--------------------------------------------

             Summary: Inconsistent results from a joined sql statement to 
postgres tables
                 Key: DRILL-4211
                 URL: https://issues.apache.org/jira/browse/DRILL-4211
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 1.3.0
         Environment: Postgres db stroage
            Reporter: Robert Hamilton-Smith


When making an sql statement that incorporates a join to a table and then a 
self join to that table to get a parent value , Drill brings back inconsistent 
results. 

Here is the sql in postgres with correct output:
{code:sql}
select trx.categoryguid,
cat.categoryname, w1.categoryname as parentcat
from transactions trx
join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
join categories w1 on (cat.categoryparentguid = w1.categoryguid)
where cat.categoryparentguid IS NOT NULL;
{code}

Output:
||categoryid||categoryname||parentcategory||
|id1|restaurants|food&Dining|
|id1|restaurants|food&Dining|
|id2|Coffee Shops|food&Dining|
|id2|Coffee Shops|food&Dining|

When run in Drill with correct storage prefix:
{code:sql}
select trx.categoryguid,
cat.categoryname, w1.categoryname as parentcat
from db.schema.transactions trx
join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
join db.schema.wpfm_categories w1 on (cat.categoryparentguid = w1.categoryguid)
where cat.categoryparentguid IS NOT NULL
{code}

Results are:
||categoryid||categoryname||parentcategory||
|id1|restaurants|null|
|id1|restaurants|null|
|id2|Coffee Shops|null|
|id2|Coffee Shops|null|

Physical plan is:
{code:sql}
00-00    Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
{110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
00-01      Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
00-02        Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) : 
rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
00-03          Jdbc(sql=[SELECT *
FROM "public"."transactions"
INNER JOIN (SELECT *
FROM "public"."categories"
WHERE "categoryparentguid" IS NOT NULL) AS "t" ON "transactions"."categoryguid" 
= "t"."categoryguid"
INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" = 
"categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, VARCHAR(1024) 
transactionrefobjectvalue, TIMESTAMP(6) transactiondate, VARCHAR(256) 
transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
VARCHAR(50) transactionorigpartyguid, VARCHAR(255) transactionorigaccountguid, 
VARCHAR(50) transactionrecpartyguid, VARCHAR(255) transactionrecaccountguid, 
VARCHAR(256) transactionstatementdesc, DECIMAL(1, 0) transactionsplit, 
DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) transactionrecategorized, 
TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) transactionupdatedat, 
VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) transactioncategoryassignat, 
VARCHAR(50) transactionsystemcategoryguid, VARCHAR(50) 
transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) categoryguid0, 
VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, VARCHAR(50) 
categoryname, VARCHAR(50) categorydescription, VARCHAR(50) partyguid, 
VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, DECIMAL(3, 0) 
categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) categorydescription0, 
VARCHAR(50) partyguid0): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64259
{code}

I worked around it by creating a view on postgres but not ideal. Thanks in 
advance.
First Drill Jira Bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to