caseykneale opened a new issue, #7394: URL: https://github.com/apache/datafusion/issues/7394
### Describe the bug The data I am using is in a normalized parquet format and each "table" is in its own file, each registered to the SessionContext instance, and can be queried with a variety of other queries. The problematic query has maybe 4 `JOIN`s (outer, inner, etc) and it ends with a `NOT EXISTS` clause on a subquery with another set of the roughly the same 4 `JOIN`s on different tables. The outer join occurs inside of a view which is shared by the other 3 inner joins. The query planning succeeds. The query runs for a while, maybe 15 minutes, and then appears as though it has completed (CPU cores spin down, RAM consumption goes down to baseline). The SegFault happens during the collection of the DataFrame itself(it's `await`ed on) before the RecordBatches are collected from the Dataframe. For what its worth, the dataframe should be empty at the end of this query as its serving as a control for a unit test. Then segfault currently occurs on an intel Mac. I saw an open issue about seg faulting in unit tests https://github.com/apache/arrow-datafusion/issues/5693 and don't know whether or not this could be the same issue. I see a few blocks of unsafe code in the project, most of which look benign, but I haven't ruled out a stack overflow scenario. Not sure where to poke at. May try adjusting `RUST_MIN_STACK` to see if that helps? Or memoizing the subquery results before the `NOT EXIST` call? Any suggestions appreciated. ### To Reproduce I can't share the data to reproduce this or the code unfortunately but something tells me I could make a MRE as I doubt this behavior is exclusive to this type of query. ### Expected behavior This may sound terse but I mean this in the most polite way possible. Ideally queries do not segfault. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org