ozankabak commented on issue #4904: URL: https://github.com/apache/arrow-datafusion/issues/4904#issuecomment-1382756387
Table 1 in this paper gives a good summary of complexities involved. Datafusion currently employs what the author calls "Removable Cumulative" mechanism. However, the `min` cases in Datafusion is actually O(n), not O(n logn) as written in the paper, due to the MovingMinMax algorithm. So in the first three frame types in Table 1, what we have is better complexity-wise. Certain implementation improvements can still be made (e.g. in binary searching) obviously, but I'd say we are doing a good job there. If I'm not mistaken we don't support the fourth frame type; i.e. frames with variable boundaries. This is the use case where the segment tree shines. When we add support for that, we should definitely consider segment tree as the algorithm of choice. To summarize, I think the segment tree approach and our current approach will coexist and just apply to different use cases, it seems like. It was a good read, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
