[GitHub] [arrow-datafusion] ozankabak commented on issue #4904: Improved performance of RANGE preceding window functions

GitBox Sat, 14 Jan 2023 06:41:08 -0800


ozankabak commented on issue #4904:
URL: 
https://github.com/apache/arrow-datafusion/issues/4904#issuecomment-1382756387


   Table 1 in this paper gives a good summary of complexities involved. 
Datafusion currently employs what the author calls "Removable Cumulative" 
mechanism. However, the `min` cases in Datafusion is actually O(n), not O(n 
logn) as written in the paper, due to the MovingMinMax algorithm.
   
   So in the first three frame types in Table 1, what we have is better 
complexity-wise. Certain implementation improvements can still be made (e.g. in 
binary searching) obviously, but I'd say we are doing a good job there.
   
   If I'm not mistaken we don't support the fourth frame type; i.e. frames with 
variable boundaries. This is the use case where the segment tree shines. When 
we add support for that, we should definitely consider segment tree as the 
algorithm of choice. To summarize, I think the segment tree approach and our 
current approach will coexist and just apply to different use cases, it seems 
like.
   
   It was a good read, thank you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on issue #4904: Improved performance of RANGE preceding window functions

Reply via email to