caseykneale commented on issue #7394: URL: https://github.com/apache/arrow-datafusion/issues/7394#issuecomment-1693574796
I was able to work around these issues by sorting the data frames and single thread walking the OUTER JOIN/NOT EXISTS in rust without getting anything near an OOM, nor a SO. This is at least 15x faster than the failed runs despite costs to serialization, etc. I recommend this approach as a workaround should anyone else run into this. I do hope to see the performance and correctness of these queries improved because I really do like this project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
