MidasLamb opened a new issue, #6744:
URL: https://github.com/apache/arrow-datafusion/issues/6744

   ### Describe the bug
   
   I have a query where I `JOIN` three tables, and then have a WHERE clause on 
the last table ("twintag.owner"). If I give a value that exists there, all the 
results are returned (regardless of whether or not the "twintag.owner" matches 
the actual value), and if I give a value that doesn't exist, it returns nothing.
   
   The query is as follows:
   ```sql
   SELECT "75620d04a8b7b34c70c68f7af68953af".*
   FROM "75620d04a8b7b34c70c68f7af68953af" 
   JOIN "f4be58656cd5e54b9bffa1706c3c49ff" AS "linkedtwintag"
       ON "75620d04a8b7b34c70c68f7af68953af"."82ad7656c6b6be56e9abfac772c89b4c" 
= "linkedtwintag"."776ff32c7d7bfb1c1a49d5c0d94e0db8" 
   JOIN "__bag" AS "linkedtwintag_twintag"
       ON "linkedtwintag"."dataScope" = "linkedtwintag_twintag"."storage_qid" 
   WHERE 1 = 1 
   AND "linkedtwintag_twintag"."owner" = '324ae4533db15dcb19e3473fea54177a' 
   ORDER BY "75620d04a8b7b34c70c68f7af68953af"."qid" ASC
   ```
   
   If I start from the intermediate table it works as expected and I get back 
the one result I'm looking for:
   ```sql
   SELECT "f4be58656cd5e54b9bffa1706c3c49ff".*
   FROM "f4be58656cd5e54b9bffa1706c3c49ff" 
   JOIN "__bag" AS "twintag"
       ON "f4be58656cd5e54b9bffa1706c3c49ff"."dataScope" = 
"twintag"."storage_qid"
   WHERE 1 = 1 
   AND "twintag"."owner" = '324ae4533db15dcb19e3473fea54177a'  ORDER BY 
"f4be58656cd5e54b9bffa1706c3c49ff"."qid" ASC
   ```
   
   ### To Reproduce
   
   Create some tables where you can construct a JOIN from table A to B and from 
B to C.
   Execute the (type of) query, where you use the intermediate table to join 
from table A to C, filtering on something in table C:
   ```sql
   SELECT "75620d04a8b7b34c70c68f7af68953af".*
   FROM "75620d04a8b7b34c70c68f7af68953af" 
   JOIN "f4be58656cd5e54b9bffa1706c3c49ff" AS "linkedtwintag"
       ON "75620d04a8b7b34c70c68f7af68953af"."82ad7656c6b6be56e9abfac772c89b4c" 
= "linkedtwintag"."776ff32c7d7bfb1c1a49d5c0d94e0db8" 
   JOIN "__bag" AS "linkedtwintag_twintag"
       ON "linkedtwintag"."dataScope" = "linkedtwintag_twintag"."storage_qid" 
   WHERE 1 = 1 
   AND "linkedtwintag_twintag"."owner" = '324ae4533db15dcb19e3473fea54177a' 
   ORDER BY "75620d04a8b7b34c70c68f7af68953af"."qid" ASC
   ```
   
   ### Expected behavior
   
   I expect only the items which match the filter on table C to be returned.
   Currently it returns either ALL items if an item matches, or NO items if 
there is no match found. In the example above I expect 1 match, but I get all 
the items from table A back.
   If I change the value I'm looking for in table C to be non-existent there, I 
get back no results instead, which is expected.
   
   ### Additional context
   
   I'm also using datafusion-remote-table from seafowl 
(https://github.com/splitgraph/seafowl/tree/main/datafusion_remote_tables)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to