viirya commented on issue #119: URL: https://github.com/apache/arrow-datafusion-comet/issues/119#issuecomment-1965768331
> feel free to break up the tasks for join when you think it is necessary, to improve the parallelism For SortMergeJoin support in Comet, it is a integral one like other working items we finished and are working on, and it makes more sense work on it as whole (except that you want to break it out to serde code, CometSortMergeJoinExec operator class, test, etc. 😂 ). There are some pre tasks and they are finished, e.g., relaxing join on expression type and adding join filter support. Improving DataFusion SortMergeJoin could be a separate task as it is orthogonal to the task of adding support in Comet. Although I am not where is the performance bottleneck yet, but from the benchmark I ran before compared to Spark, it doesn't have better performance but just similar. SortMergeJoin spilling support is also another separate task. I created a ticket for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
