alamb commented on issue #15607:
URL: https://github.com/apache/datafusion/issues/15607#issuecomment-2855884879

   > [@Dandandan](https://github.com/Dandandan) I've reviewed the current 
implementation of window functions in DataFusion and studied some related 
concepts. This DuckDB blog post, [Flying Through 
Windows](https://duckdb.org/2025/02/14/window-flying.html), describes their 
efforts in optimizing window functions. However, in practice, many of these 
optimizations are function-specific, and I don't yet have a good idea for 
implementing a general-purpose vectorized solution.
   
   Yeah I am not sure that blog post has a lot to offer DataFusion
   
   > 
   > Additionally, I noticed that we haven't implemented the segment tree and 
vectorized optimizations mentioned in the DuckDB blog. This is because our 
current window frame implementation does not yet support flexible expr in 
frames. Before proceeding with such optimizations, should we first implement 
the feature described in 
[ISSUE-15714](https://github.com/apache/datafusion/issues/15714)?
   
   It seems like vectorized implementations would be a great first step but 
they may be independent of calculating the window ranges.
   
   In terms of Segment Trees, I know the TUM folks wrote [a paper about how 
great it was](https://www.vldb.org/pvldb/vol8/p1058-leis.pdf) but to my 
knowledge the technique is only implemented in Umbra and DuckDB. The content of 
[Flying Through Windows](https://duckdb.org/2025/02/14/window-flying.html) 
certainly implies to me that there isn't really a "well known" pattern / easy 
to implement segment tree approach
   
   I think the technique of sorting the windows by ORDER BY/ PARTITION BY 
(which Datafusion implements) is far more common and has many nice properties 
(like the sorting can take advantage of the work to improve larger than memory 
sorting)
   
   > 
   > Btw, I attempted to support streaming of aggregation results for different 
ranges within the same partition in BoundedWindowAggStream. But current 
benchmarks haven’t shown significant performance gains, with most of the time 
spent in calculate_range. I plan to add more complex benchmark case to assess 
performance improvements.
   
   
   Thank you!
   
   FYI @akurmustafa / @mustafasrepo in case you have other thoughts on this 
matter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to