alamb commented on pull request #8032:
URL: https://github.com/apache/arrow/pull/8032#issuecomment-681867726


   Good luck -- at this stage of a project (when architecture is changing a
   bunch) know it is hard to make small / easy to review PRs. I hope the
   comments are helpful and I am sorry I don't have more time to devote to
   reviews.
   
   On Tue, Aug 25, 2020 at 4:10 PM Jorge Leitao <notificati...@github.com>
   wrote:
   
   > @alamb <https://github.com/alamb> thank you very much for your comments,
   > I will now work on addressing them now. I still learning the Arc/Box/Ref,
   > so thank you a lot for also teaching me.
   >
   > @andygrove <https://github.com/andygrove> , I agree with you that
   > built-in functions should not require access to the registry.
   > Unfortunately, doing so required some re-work, which is the reason I
   > retracted #7967 <https://github.com/apache/arrow/pull/7967> back to draft
   > to focus on this one first.
   >
   > I pushed a new commit to this PR to address this point. Specifically, that
   > commit adds:
   >
   >    - a new enum with all built-in functions
   >    - functionally gluing the logical plan with the physical plan so that
   >    the function's return types are invariant.
   >    - made type coercion on built-in functions to be on the physical
   >    plane, to preserve schema invariance during planning.
   >
   > I am pretty happy with this PR, as IMO has the flexibility we need to
   > expand DataFusion's pool of built-in functions to multiple input and return
   > types. The main features of this PR:
   >
   >    - users no longer have to pass the return type of the UDF when calling
   >    them (the proposal)
   >    - planning built-in functions continue to not need access to the
   >    registry (@andygrove <https://github.com/andygrove> 's point)
   >    - built-in functions now support multiple input types (e.g. sqrt(f32),
   >    sqrt(f64))
   >    - built-in functions now support multiple return types (e.g. sqrt(f32)
   >    -> f32, sqrt(f64) -> f64)
   >    - coercion rules are no longer applied in the sql planning or physical
   >    planning to built-in functions, to avoid breaking schema invariance 
during
   >    planning
   >
   > I have not completed the valid return types of built-in math functions as
   > this PR was already too long.
   >
   > Overall, I think that this has not been a pleasant experience for you
   > @andygrove <https://github.com/andygrove> and @alamb
   > <https://github.com/alamb>, as I constantly open and close PRs around
   > functions/UDFs, and for that I am really sorry. I've been hitting some
   > design challenge after another, which requires me to go back and forth.
   >
   > I am still in pursuit of my original quests:
   >
   >    - built-in aggregate functions whose logical types are known from the
   >    physical expressions
   >    - type coercion on aggregate functions
   >    - built-in aggregate functions whose return types (e.g. min(f32) -> f32,
   >    min(f64) -> f64) are directly derived from the physical plan (there is
   >    an old fixme/todo in the code around that)
   >    - aggregate udfs
   >    - udfs with multiple incoming and return types, to bring them to the
   >    same level of functionality of built-ins
   >    - planning a udf without registering it (a-la spark) in the DF's API.
   >
   > I have code for some of this, I... just... need... to... finish... the...
   > scalar... stuff... first... 😃
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/8032#issuecomment-680244598>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AADXZMLPUU3TGXRLTCKWWKLSCQLEFANCNFSM4QIW6KXQ>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to