Re: [PR] Support IGNORE NULLS for LAG window function [arrow-datafusion]

via GitHub Tue, 20 Feb 2024 04:15:36 -0800


mustafasrepo commented on PR #9221:
URL: 
https://github.com/apache/arrow-datafusion/pull/9221#issuecomment-1954090426


   Thanks @comphead for this PR.
   Currently we have two API for window function calculations:
   - evaluate_all
   - evaluate
   `evaluate_all` takes all the data as single batch. In this version we have 
all available data for decision.
   `evaluate` takes only absolutely necessary section for the calculation of 
the window function result. 
   This PR changes `evaluate` implementation to support `LAG`. However, as far 
as I can see its corresponding `evaluate_all` handling is not done. 
   
   Also it seems that current `LAG` implementation only works for lag 1 (which 
is the default). However, for other lag values I don't think current 
implementation will generate correct results.
   
   
   > for LEAD function we likely need to refactor the evaluator and how it 
works.
   The problem is for LEAD we have to adjust values that have already been 
emitted by evaluator which is not doable afaik. @mustafasrepo I would love to 
get your input how we can solve such challenge. One of solution is to emit not 
the single value like now, but the entire resulting array so it gives more 
control
   
   I think, we can generate correct result without changing the API by keeping 
track of null_count within the offset interval in running fashion. However, I 
am not sure though. 
   
   I think 
   - we should first add support for `evaluate_all` API. This support should be 
easier than the `evaluate` support. Since `evaluate_all` API has all the data 
possible. `Lag`, `Lead` support can be implemented for this API. There won't be 
much difference, as far as I can presume.
   - Then, we should add support for `evaluate` API. 
   
   I can work on the support for `evaluate` API. If that is Ok for you. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support IGNORE NULLS for LAG window function [arrow-datafusion]

Reply via email to