Hi folks,

We’ve set up a dedicated bi-weekly community sync for the UDF project.
Everyone’s welcome to drop in and share ideas! Here is the meeting link:

Iceberg UDF sync
Monday, June 2 · 9:00 – 10:00am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/aui-czix-nbh

Yufei


On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Update on the progress.
>
> I had a meeting today with Yufei and Yun.zou to discuss the UDF proposal.
> We covered several key points, though some are still open for further
> discussion:
>
> a) *UDF Versioning*: Do we truly need versioning for UDFs at this stage?
> We explored the possibility of simplifying the specification by avoiding
> view replication, and potentially introducing versioning support later.
> UDTFs, being a superset of views in some ways, may not require versioning
> initially.
>
> b) *VarArgs Support*: While some query engines may not support vararg
> syntax in CREATE FUNCTION, Iceberg UDFs could represent such arguments as
> lists when supported by the engine.
>
> c) *Generics in UDFs*: Since Iceberg currently doesn’t support generic
> types (e.g., object), we can only map engine-specific types to Iceberg
> types. As a result, generic data types will not be supported in the initial
> version.
>
> d) *Python Support*: Incorporating Python as a language for SQL UDFs
> seems promising, especially given its potential to resolve interoperability
> challenges. Some engines, however, require platform version and package
> dependency details to execute Python code—this should be captured in the
> specification.
>
> *Next Steps*
> I will update the proposal document with two primary UDF use cases:
>
>    -
>
>    Policy exchange between engines
>    -
>
>    UDTF as a superset of view functionality
>
> The update will include corresponding syntax examples in both SQL and
> Python, and detail how each use case is represented in Iceberg metadata.
>
> We also plan to set up regular syncs (open to more interested
> participants) to continue refining and finalizing the UDF specification.
> - Ajantha
>
>
> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I've updated the design document[1] based on the previous comments.
>> Additionally, I've included the SQL UDF syntax supported by various
>> vendors, including Dremio, Snowflake, Databricks, and Trino.
>>
>> I'm happy to schedule a separate sync if a deeper discussion is needed.
>> Let's keep moving forward, especially with the renewed interest from the
>> community.
>>
>> [1]
>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>
>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> Hey everyone,
>>>
>>> During the last catalog community sync, there was significant interest
>>> in storing UDFs in Iceberg and adding endpoints for UDF handling in the
>>> REST catalog spec.
>>>
>>> I recently discussed this with Yufei to better understand the new
>>> requirement of using UDFs for fine-grained access control policies. This
>>> expands the use cases beyond just versioned and interoperable UDFs.
>>> Additionally, I learnt that many vendors are interested in this feature.
>>>
>>> Given the strong community interest and support, I’d like to take
>>> ownership of this effort and revive the work. I'll be revisiting the
>>> document I proposed long back and will share an updated proposal by next
>>> week.
>>>
>>> Looking forward to storing UDFs in Iceberg!
>>> - Ajantha
>>>
>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>
>>>> The UDF spec does not require representations to be SQL. It merely does
>>>> not specify (in this revision) how other representations are to be written.
>>>>
>>>> This seems like an easy extension (adding a new type in the
>>>> "Representations" section).
>>>>
>>>> Cheers,
>>>> Dmitri.
>>>>
>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue <b...@databricks.com.invalid>
>>>> wrote:
>>>>
>>>>> Right now, SQL is an explicit requirement of the spec. It leaves a way
>>>>> for future versions to add different representations later, but only SQL 
>>>>> is
>>>>> supported. That was also the feedback to my initial skepticism about how 
>>>>> it
>>>>> would work to add functions.
>>>>>
>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov
>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>
>>>>>> I do not think the spec is meant to allow only SQL representations,
>>>>>> although it is certainly faviouring SQL in examples... It would be nice 
>>>>>> to
>>>>>> add a non-SQL example, indeed.
>>>>>>
>>>>>> Cheers,
>>>>>> Dmitri.
>>>>>>
>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <fo...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Coming from PyIceberg, I have concerns as this proposal focuses on
>>>>>>> SQL-based engines, while Python-based systems often work with data 
>>>>>>> frames.
>>>>>>> Adding imperative languages like Python would make this proposal more
>>>>>>> inclusive.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Fokko
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen <
>>>>>>> piotr.findei...@gmail.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Walaa, thanks for asking!
>>>>>>>> In the design doc linked before  in this thread [1] i read
>>>>>>>> "Without a common standard, the UDFs are hard to share among
>>>>>>>> different engines."
>>>>>>>> ("Background and Motivation" section).
>>>>>>>> I agree with this statement. I don't fully understand yet how the
>>>>>>>> proposed design addresses shareability between the engines though.
>>>>>>>> I would use some help to understand this better.
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Piotr
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>
>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa <
>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Piotr, what do you mean by making user-created functions shareable
>>>>>>>>> between engines? Do you mean UDFs written in imperative code?
>>>>>>>>>
>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>> <piotr.findei...@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > Thank you Ajantha for creating this thread. The Iceberg UDFs are
>>>>>>>>> an interesting idea!
>>>>>>>>> > Is there a plan to make the user-created functions sharable
>>>>>>>>> between the engines?
>>>>>>>>> > If so, how would a CREATE FUNCTION statement look like in e..g
>>>>>>>>> Spark or Trino?
>>>>>>>>> >
>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>> >
>>>>>>>>> > Best
>>>>>>>>> > Piotr
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>> >>
>>>>>>>>> >> I just looked through the proposal and added comments. I think
>>>>>>>>> it would be helpful to also have a design doc that covers the choices 
>>>>>>>>> from
>>>>>>>>> the draft spec. For instance, the choice to enumerate all possible 
>>>>>>>>> function
>>>>>>>>> input struts rather than allowing generics and varargs.
>>>>>>>>> >>
>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>> >>
>>>>>>>>> >> I think that the choice to enumerate function signatures is
>>>>>>>>> limiting. It would be nice to see a discussion of the trade-offs and a
>>>>>>>>> rationale for the choice. I think it would also be very helpful to 
>>>>>>>>> have a
>>>>>>>>> few representative use cases for this included in the doc. That way 
>>>>>>>>> the
>>>>>>>>> proposal can demonstrate that it solves those use cases with 
>>>>>>>>> reasonable
>>>>>>>>> trade-offs.
>>>>>>>>> >> There are a few instances where this is inconsistent with
>>>>>>>>> conventions in other specs. For example, using string IDs rather than 
>>>>>>>>> an
>>>>>>>>> integer.
>>>>>>>>> >> This uses a very different model for spec versioning than the
>>>>>>>>> Iceberg view and table specs. It requires readers to fail if there 
>>>>>>>>> are any
>>>>>>>>> unknown fields, which prevents the spec from adding things that are 
>>>>>>>>> fully
>>>>>>>>> backward-compatible. Other Iceberg specs only require a version 
>>>>>>>>> change to
>>>>>>>>> introduce forward-incompatible changes and I think that this should 
>>>>>>>>> do the
>>>>>>>>> same to avoid confusion.
>>>>>>>>> >> It looks like the intent is to allow multiple function
>>>>>>>>> signatures per verison, but it is unclear how to encode them because a
>>>>>>>>> version is associated with a single function signature.
>>>>>>>>> >> There is no review of SQL syntax for creating functions across
>>>>>>>>> engines, so this doesn’t show that the metadata proposed is 
>>>>>>>>> sufficient for
>>>>>>>>> cross-engine use cases.
>>>>>>>>> >> The example for a table-valued function shows a SELECT
>>>>>>>>> statement and it isn’t clear how this is distinct from a view
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Thanks Walaa and Robert for the review on this.
>>>>>>>>> >>>
>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>> >>> I will wait for a week and If no more review comments, I will
>>>>>>>>> raise a PR for spec addition next week.
>>>>>>>>> >>>
>>>>>>>>> >>> If anyone else is interested, please have a look at the
>>>>>>>>> proposal
>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>> >>>
>>>>>>>>> >>> - Ajantha
>>>>>>>>> >>>
>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa <
>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>> >>>>
>>>>>>>>> >>>> I have left some comments. It is an interesting direction,
>>>>>>>>> but there might be some details that need to be fine tuned.
>>>>>>>>> >>>>
>>>>>>>>> >>>> The doc is here [1] for others who might be interested.
>>>>>>>>> Resharing since I do not think it was directly linked in the thread.
>>>>>>>>> >>>>
>>>>>>>>> >>>> [1]
>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>> >>>>
>>>>>>>>> >>>> Thanks,
>>>>>>>>> >>>> Walaa.
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Hi, just another reminder since we didn't get any review on
>>>>>>>>> the proposal.
>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> - Ajantha
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> We've only received one review so far (from Benny).
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Google doc link is attached in the proposal.
>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions and how
>>>>>>>>> we want to implement it.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa <
>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant scalar/aggregate/table user
>>>>>>>>> defined functions. Here are some examples of what I meant in (2):
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Probably what you referred to is a variation of (1) where
>>>>>>>>> the API is data flow/data pipeline API instead of SQL (e.g., Spark 
>>>>>>>>> Scala).
>>>>>>>>> Yes, that is also possible in the very long run :)
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye <
>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative function
>>>>>>>>> according to a Java/Scala/Python API, etc.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> I think we could still explore some long term
>>>>>>>>> opportunities in this case. Consider you register a Spark temp view 
>>>>>>>>> as some
>>>>>>>>> sort of data frame read, then it could still be resolved to a Spark 
>>>>>>>>> plan
>>>>>>>>> that is representable by an intermediate representation. But I agree 
>>>>>>>>> this
>>>>>>>>> gets very complicated very soon, and just having the case (1) covered 
>>>>>>>>> would
>>>>>>>>> already be a huge step forward.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow <
>>>>>>>>> btc...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL UDF can be
>>>>>>>>> used to build a parameterized view.  So, there's definitely a lot in 
>>>>>>>>> common
>>>>>>>>> between UDFs and views.
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa <
>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is perceived
>>>>>>>>> as a "UDF". There are 2 flavors:
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user whose
>>>>>>>>> definition is a composition of other built-in functions/SQL 
>>>>>>>>> expressions.
>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative function
>>>>>>>>> according to a Java/Scala/Python API, etc.
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are pretty
>>>>>>>>> much from (1) and I think those have more analogy to views due to 
>>>>>>>>> their SQL
>>>>>>>>> nature. Agree (2) is not practical to maintain by Iceberg, but I think
>>>>>>>>> Ajantha's use cases are around (1), and may be worth evaluating.
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the proposal,
>>>>>>>>> but I think this would be a very difficult area to tackle across 
>>>>>>>>> engines,
>>>>>>>>> languages, and memory models without having a huge performance 
>>>>>>>>> penalty.
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL
>>>>>>>>> representations of UDFs (similar to views as shared by the reference 
>>>>>>>>> links
>>>>>>>>> above), the complexity involved will be similar to managing views.
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input.
>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec (inspired
>>>>>>>>> by the view spec) this week to facilitate further discussions.
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye <
>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common set of
>>>>>>>>> functions across engines, I don't see how that is practical when those
>>>>>>>>> engines are implemented so differently. Plugging in code -- and 
>>>>>>>>> especially
>>>>>>>>> custom user-supplied code -- seems inherently specialized to me and 
>>>>>>>>> should
>>>>>>>>> be part of the engines' design.
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I feel we can
>>>>>>>>> say exactly the same thing for Iceberg views, but yet we have Iceberg
>>>>>>>>> multi-dialect views implemented. Maybe it sounds like we are trying 
>>>>>>>>> to draw
>>>>>>>>> a line between SQL vs other programming language as "code"? but I 
>>>>>>>>> think SQL
>>>>>>>>> is just another type of code, and we are already talking about 
>>>>>>>>> compiling
>>>>>>>>> all these different code dialects to an intermediate representation 
>>>>>>>>> (using
>>>>>>>>> projects like Coral, Substrait), which will be stored as another type 
>>>>>>>>> of
>>>>>>>>> representation of Iceberg view. I think the same functionality can be 
>>>>>>>>> used
>>>>>>>>> for UDFs if developed.
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a good idea,
>>>>>>>>> even just a multi-dialect one like view, and that can allow engines 
>>>>>>>>> to for
>>>>>>>>> example parse a view SQL, and when a function referenced cannot be
>>>>>>>>> resolved, try to seek for a multi-dialect UDF definition.
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have the actual
>>>>>>>>> proposal published.
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp <
>>>>>>>>> sn...@snazy.de> wrote:
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable and
>>>>>>>>> "non-centralized" as views are. The same performance concerns apply to
>>>>>>>>> views as well.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon which
>>>>>>>>> engines can build, so the argument that UDFs aren't practical, because
>>>>>>>>> engines are different, is probably only a temporary concern.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try to tackle
>>>>>>>>> the idea to make views portable, which is conceptually not that much
>>>>>>>>> different from portable UDFs.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative touch to the
>>>>>>>>> idea of having UDFs in Iceberg, especially not in this early stage.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good idea to add
>>>>>>>>> UDFs tracked by Iceberg catalogs. I think that Iceberg primarily 
>>>>>>>>> deals with
>>>>>>>>> things that are centralized, like tables of data. While it would be 
>>>>>>>>> great
>>>>>>>>> to have a common set of functions across engines, I don't see how 
>>>>>>>>> that is
>>>>>>>>> practical when those engines are implemented so differently. Plugging 
>>>>>>>>> in
>>>>>>>>> code -- and especially custom user-supplied code -- seems inherently
>>>>>>>>> specialized to me and should be part of the engines' design.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the proposal,
>>>>>>>>> but I think this would be a very difficult area to tackle across 
>>>>>>>>> engines,
>>>>>>>>> languages, and memory models without having a huge performance 
>>>>>>>>> penalty.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the community
>>>>>>>>> interest in storing the Versioned SQL UDFs in Iceberg.
>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for storing
>>>>>>>>> the versioned UDFs in Iceberg (inspired by view spec).
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to views in that
>>>>>>>>> they are associated with tables, but they can accept arguments and 
>>>>>>>>> produce
>>>>>>>>> return values, or even function as inline expressions.
>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, Snowflake,
>>>>>>>>> Databricks Spark supports SQL UDFs at catalog level [1].
>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable
>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines.
>>>>>>>>> Potentially engines can understand the UDFs written by other engines 
>>>>>>>>> (with
>>>>>>>>> the translate layer).
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature into
>>>>>>>>> Iceberg would be a valuable addition, and we're eager to collaborate 
>>>>>>>>> with
>>>>>>>>> the community to develop a UDF specification.
>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a specification
>>>>>>>>> to propose to the community.
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this.
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Ryan Blue
>>>>>>>>> >> Databricks
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Databricks
>>>>>
>>>>

Reply via email to