alamb commented on PR #7379: URL: https://github.com/apache/arrow-datafusion/pull/7379#issuecomment-1690597737
@wiedld and I spoke a bit this afternoon and I think the next steps for this PR are to get a query that shows significant performance improvements. I think the one in https://github.com/apache/arrow-datafusion/pull/7379#issuecomment-1690507812 is a good candidate I don't really understand the code in this PR yet, but the way I suggest trying to add more parallelism is by "buffering" the the streams so that rather than computing everything on demand with `poll_next` spawn an explicit tokio::task for each input stream that will try to pull the next input while the current task is merging the input. Maybe @crepererum or @tustvold can help with a suggestion on how to do the "add buffering/new tasks" in a reasonable rust way -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
