caseykneale opened a new issue, #7373: URL: https://github.com/apache/arrow-datafusion/issues/7373
### Describe the bug I am using the SQL interface to query parquet data. I am registering each file in a datafusion context. The query contains some joins, and groupbys (where I speculate the trouble is). Maybe 1/5 attempts I get the correct answer from a compiled binary (0 records). but the other 4/5 attempts I see a lot of erroneous results appearing. So to be clear, the correct answer is 0 records, and we should never see records appearing otherwise (unless DF's groupby operations are nondeterministic/nonsequential?). Yet I only see that on a rare occasion of runs. I feel like I might be missing something here(do I need to sort first?) but this looks like a bug to me. ### To Reproduce I can't share the data, but the query looks like the last two queries on this sql fiddle(they're the same) I borrowed from someone on stack overflow and wrote the query I care about. https://dbfiddle.uk/hA8-ejaw In this example we do see some rows being returned but in my actual use case there should be none. ### Expected behavior The correct answer of records are returned or there is documentation explaining why this doesn't happen. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
