mustafasrepo commented on PR #9221: URL: https://github.com/apache/arrow-datafusion/pull/9221#issuecomment-1954090426
Thanks @comphead for this PR. Currently we have two API for window function calculations: - evaluate_all - evaluate `evaluate_all` takes all the data as single batch. In this version we have all available data for decision. `evaluate` takes only absolutely necessary section for the calculation of the window function result. This PR changes `evaluate` implementation to support `LAG`. However, as far as I can see its corresponding `evaluate_all` handling is not done. Also it seems that current `LAG` implementation only works for lag 1 (which is the default). However, for other lag values I don't think current implementation will generate correct results. > for LEAD function we likely need to refactor the evaluator and how it works. The problem is for LEAD we have to adjust values that have already been emitted by evaluator which is not doable afaik. @mustafasrepo I would love to get your input how we can solve such challenge. One of solution is to emit not the single value like now, but the entire resulting array so it gives more control I think, we can generate correct result without changing the API by keeping track of null_count within the offset interval in running fashion. However, I am not sure though. I think - we should first add support for `evaluate_all` API. This support should be easier than the `evaluate` support. Since `evaluate_all` API has all the data possible. `Lag`, `Lead` support can be implemented for this API. There won't be much difference, as far as I can presume. - Then, we should add support for `evaluate` API. I can work on the support for `evaluate` API. If that is Ok for you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
