MidasLamb opened a new issue, #6744:
URL: https://github.com/apache/arrow-datafusion/issues/6744
### Describe the bug
I have a query where I `JOIN` three tables, and then have a WHERE clause on
the last table ("twintag.owner"). If I give a value that exists there, all the
results are returned (regardless of whether or not the "twintag.owner" matches
the actual value), and if I give a value that doesn't exist, it returns nothing.
The query is as follows:
```sql
SELECT "75620d04a8b7b34c70c68f7af68953af".*
FROM "75620d04a8b7b34c70c68f7af68953af"
JOIN "f4be58656cd5e54b9bffa1706c3c49ff" AS "linkedtwintag"
ON "75620d04a8b7b34c70c68f7af68953af"."82ad7656c6b6be56e9abfac772c89b4c"
= "linkedtwintag"."776ff32c7d7bfb1c1a49d5c0d94e0db8"
JOIN "__bag" AS "linkedtwintag_twintag"
ON "linkedtwintag"."dataScope" = "linkedtwintag_twintag"."storage_qid"
WHERE 1 = 1
AND "linkedtwintag_twintag"."owner" = '324ae4533db15dcb19e3473fea54177a'
ORDER BY "75620d04a8b7b34c70c68f7af68953af"."qid" ASC
```
If I start from the intermediate table it works as expected and I get back
the one result I'm looking for:
```sql
SELECT "f4be58656cd5e54b9bffa1706c3c49ff".*
FROM "f4be58656cd5e54b9bffa1706c3c49ff"
JOIN "__bag" AS "twintag"
ON "f4be58656cd5e54b9bffa1706c3c49ff"."dataScope" =
"twintag"."storage_qid"
WHERE 1 = 1
AND "twintag"."owner" = '324ae4533db15dcb19e3473fea54177a' ORDER BY
"f4be58656cd5e54b9bffa1706c3c49ff"."qid" ASC
```
### To Reproduce
Create some tables where you can construct a JOIN from table A to B and from
B to C.
Execute the (type of) query, where you use the intermediate table to join
from table A to C, filtering on something in table C:
```sql
SELECT "75620d04a8b7b34c70c68f7af68953af".*
FROM "75620d04a8b7b34c70c68f7af68953af"
JOIN "f4be58656cd5e54b9bffa1706c3c49ff" AS "linkedtwintag"
ON "75620d04a8b7b34c70c68f7af68953af"."82ad7656c6b6be56e9abfac772c89b4c"
= "linkedtwintag"."776ff32c7d7bfb1c1a49d5c0d94e0db8"
JOIN "__bag" AS "linkedtwintag_twintag"
ON "linkedtwintag"."dataScope" = "linkedtwintag_twintag"."storage_qid"
WHERE 1 = 1
AND "linkedtwintag_twintag"."owner" = '324ae4533db15dcb19e3473fea54177a'
ORDER BY "75620d04a8b7b34c70c68f7af68953af"."qid" ASC
```
### Expected behavior
I expect only the items which match the filter on table C to be returned.
Currently it returns either ALL items if an item matches, or NO items if
there is no match found. In the example above I expect 1 match, but I get all
the items from table A back.
If I change the value I'm looking for in table C to be non-existent there, I
get back no results instead, which is expected.
### Additional context
I'm also using datafusion-remote-table from seafowl
(https://github.com/splitgraph/seafowl/tree/main/datafusion_remote_tables)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]