caseykneale opened a new issue, #7373:
URL: https://github.com/apache/arrow-datafusion/issues/7373

   ### Describe the bug
   
   I am using the SQL interface to query parquet data. I am registering each 
file in a datafusion context. The query contains some joins, and groupbys 
(where I speculate the trouble is). Maybe 1/5 attempts I get the correct answer 
from a compiled binary (0 records). but the other 4/5 attempts I see a lot of 
erroneous results appearing.
   
   So to be clear, the correct answer is 0 records, and we should never see 
records appearing otherwise (unless DF's groupby operations are 
nondeterministic/nonsequential?). Yet I only see that on a rare occasion of 
runs.
   
   I feel like I might be missing something here(do I need to sort first?) but 
this looks like a bug to me.
   
   ### To Reproduce
   
   I can't share the data, but the query looks like the last two queries on 
this sql fiddle(they're the same) I borrowed from someone on stack overflow and 
wrote the query I care about.
   
   https://dbfiddle.uk/hA8-ejaw 
   
   In this example we do see some rows being returned but in my actual use case 
there should be none.
   
   ### Expected behavior
   
   The correct answer of records are returned or there is documentation 
explaining why this doesn't happen.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to