suibianwanwank commented on issue #15607: URL: https://github.com/apache/datafusion/issues/15607#issuecomment-2855229603
@Dandandan I've reviewed the current implementation of window functions in DataFusion and studied some related concepts. This DuckDB blog post, [Flying Through Windows](https://duckdb.org/2025/02/14/window-flying.html), describes their efforts in optimizing window functions. However, in practice, many of these optimizations are function-specific, and I don't yet have a good idea for implementing a general-purpose vectorized solution. Additionally, I noticed that we haven't implemented the segment tree and vectorized optimizations mentioned in the DuckDB blog. This is because our current window frame implementation does not yet support flexible expr in frames. Before proceeding with such optimizations, should we first implement the feature described in [ISSUE-15714](https://github.com/apache/datafusion/issues/15714)? Btw, I attempted to support streaming of aggregation results for different ranges within the same partition in BoundedWindowAggStream. But current benchmarks haven’t shown significant performance gains, with most of the time spent in calculate_range. I plan to add more complex benchmark case to assess performance improvements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
