sandugood commented on PR #4760: URL: https://github.com/apache/datafusion-comet/pull/4760#issuecomment-4842874790
Will try to give more context as this might be suitable (however, it might be related to a different issue): When running the same query (as presented in the issue that his PR tackles) just from logs it can be seen that Comet skips a rather big stage. 1. Default Spark has both of these stages at the beginning of the execution: `[Stage 0:> (0 + 0) / 7350][Stage 1:> (0 + 0) / 7350]` 2. Comet has only one stage with 7350 tasks. It might be relative to FULLOUTER join. Because when we are forming features with a 30-day, 180-day, 365-day windows everything seems fine and resulting values are the same across both engines. However when performing FULLOUTER join for the end result - we get significantly smaller values for Comet's side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
