Hi folks, We’ve set up a dedicated bi-weekly community sync for the UDF project. Everyone’s welcome to drop in and share ideas! Here is the meeting link:
Iceberg UDF sync Monday, June 2 · 9:00 – 10:00am Time zone: America/Los_Angeles Google Meet joining info Video call link: https://meet.google.com/aui-czix-nbh Yufei On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Update on the progress. > > I had a meeting today with Yufei and Yun.zou to discuss the UDF proposal. > We covered several key points, though some are still open for further > discussion: > > a) *UDF Versioning*: Do we truly need versioning for UDFs at this stage? > We explored the possibility of simplifying the specification by avoiding > view replication, and potentially introducing versioning support later. > UDTFs, being a superset of views in some ways, may not require versioning > initially. > > b) *VarArgs Support*: While some query engines may not support vararg > syntax in CREATE FUNCTION, Iceberg UDFs could represent such arguments as > lists when supported by the engine. > > c) *Generics in UDFs*: Since Iceberg currently doesn’t support generic > types (e.g., object), we can only map engine-specific types to Iceberg > types. As a result, generic data types will not be supported in the initial > version. > > d) *Python Support*: Incorporating Python as a language for SQL UDFs > seems promising, especially given its potential to resolve interoperability > challenges. Some engines, however, require platform version and package > dependency details to execute Python code—this should be captured in the > specification. > > *Next Steps* > I will update the proposal document with two primary UDF use cases: > > - > > Policy exchange between engines > - > > UDTF as a superset of view functionality > > The update will include corresponding syntax examples in both SQL and > Python, and detail how each use case is represented in Iceberg metadata. > > We also plan to set up regular syncs (open to more interested > participants) to continue refining and finalizing the UDF specification. > - Ajantha > > > On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> Hi everyone, >> >> I've updated the design document[1] based on the previous comments. >> Additionally, I've included the SQL UDF syntax supported by various >> vendors, including Dremio, Snowflake, Databricks, and Trino. >> >> I'm happy to schedule a separate sync if a deeper discussion is needed. >> Let's keep moving forward, especially with the renewed interest from the >> community. >> >> [1] >> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >> >> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Hey everyone, >>> >>> During the last catalog community sync, there was significant interest >>> in storing UDFs in Iceberg and adding endpoints for UDF handling in the >>> REST catalog spec. >>> >>> I recently discussed this with Yufei to better understand the new >>> requirement of using UDFs for fine-grained access control policies. This >>> expands the use cases beyond just versioned and interoperable UDFs. >>> Additionally, I learnt that many vendors are interested in this feature. >>> >>> Given the strong community interest and support, I’d like to take >>> ownership of this effort and revive the work. I'll be revisiting the >>> document I proposed long back and will share an updated proposal by next >>> week. >>> >>> Looking forward to storing UDFs in Iceberg! >>> - Ajantha >>> >>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>> >>>> The UDF spec does not require representations to be SQL. It merely does >>>> not specify (in this revision) how other representations are to be written. >>>> >>>> This seems like an easy extension (adding a new type in the >>>> "Representations" section). >>>> >>>> Cheers, >>>> Dmitri. >>>> >>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue <b...@databricks.com.invalid> >>>> wrote: >>>> >>>>> Right now, SQL is an explicit requirement of the spec. It leaves a way >>>>> for future versions to add different representations later, but only SQL >>>>> is >>>>> supported. That was also the feedback to my initial skepticism about how >>>>> it >>>>> would work to add functions. >>>>> >>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>> >>>>>> I do not think the spec is meant to allow only SQL representations, >>>>>> although it is certainly faviouring SQL in examples... It would be nice >>>>>> to >>>>>> add a non-SQL example, indeed. >>>>>> >>>>>> Cheers, >>>>>> Dmitri. >>>>>> >>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <fo...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Coming from PyIceberg, I have concerns as this proposal focuses on >>>>>>> SQL-based engines, while Python-based systems often work with data >>>>>>> frames. >>>>>>> Adding imperative languages like Python would make this proposal more >>>>>>> inclusive. >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> >>>>>>> >>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>> piotr.findei...@gmail.com>: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Walaa, thanks for asking! >>>>>>>> In the design doc linked before in this thread [1] i read >>>>>>>> "Without a common standard, the UDFs are hard to share among >>>>>>>> different engines." >>>>>>>> ("Background and Motivation" section). >>>>>>>> I agree with this statement. I don't fully understand yet how the >>>>>>>> proposed design addresses shareability between the engines though. >>>>>>>> I would use some help to understand this better. >>>>>>>> >>>>>>>> Best >>>>>>>> Piotr >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>> >>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Piotr, what do you mean by making user-created functions shareable >>>>>>>>> between engines? Do you mean UDFs written in imperative code? >>>>>>>>> >>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>> > >>>>>>>>> > Hi, >>>>>>>>> > >>>>>>>>> > Thank you Ajantha for creating this thread. The Iceberg UDFs are >>>>>>>>> an interesting idea! >>>>>>>>> > Is there a plan to make the user-created functions sharable >>>>>>>>> between the engines? >>>>>>>>> > If so, how would a CREATE FUNCTION statement look like in e..g >>>>>>>>> Spark or Trino? >>>>>>>>> > >>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>> > >>>>>>>>> > Best >>>>>>>>> > Piotr >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>> >> >>>>>>>>> >> I just looked through the proposal and added comments. I think >>>>>>>>> it would be helpful to also have a design doc that covers the choices >>>>>>>>> from >>>>>>>>> the draft spec. For instance, the choice to enumerate all possible >>>>>>>>> function >>>>>>>>> input struts rather than allowing generics and varargs. >>>>>>>>> >> >>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>> >> >>>>>>>>> >> I think that the choice to enumerate function signatures is >>>>>>>>> limiting. It would be nice to see a discussion of the trade-offs and a >>>>>>>>> rationale for the choice. I think it would also be very helpful to >>>>>>>>> have a >>>>>>>>> few representative use cases for this included in the doc. That way >>>>>>>>> the >>>>>>>>> proposal can demonstrate that it solves those use cases with >>>>>>>>> reasonable >>>>>>>>> trade-offs. >>>>>>>>> >> There are a few instances where this is inconsistent with >>>>>>>>> conventions in other specs. For example, using string IDs rather than >>>>>>>>> an >>>>>>>>> integer. >>>>>>>>> >> This uses a very different model for spec versioning than the >>>>>>>>> Iceberg view and table specs. It requires readers to fail if there >>>>>>>>> are any >>>>>>>>> unknown fields, which prevents the spec from adding things that are >>>>>>>>> fully >>>>>>>>> backward-compatible. Other Iceberg specs only require a version >>>>>>>>> change to >>>>>>>>> introduce forward-incompatible changes and I think that this should >>>>>>>>> do the >>>>>>>>> same to avoid confusion. >>>>>>>>> >> It looks like the intent is to allow multiple function >>>>>>>>> signatures per verison, but it is unclear how to encode them because a >>>>>>>>> version is associated with a single function signature. >>>>>>>>> >> There is no review of SQL syntax for creating functions across >>>>>>>>> engines, so this doesn’t show that the metadata proposed is >>>>>>>>> sufficient for >>>>>>>>> cross-engine use cases. >>>>>>>>> >> The example for a table-valued function shows a SELECT >>>>>>>>> statement and it isn’t clear how this is distinct from a view >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>> >>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>> >>> >>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>> >>> I will wait for a week and If no more review comments, I will >>>>>>>>> raise a PR for spec addition next week. >>>>>>>>> >>> >>>>>>>>> >>> If anyone else is interested, please have a look at the >>>>>>>>> proposal >>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>> >>> >>>>>>>>> >>> - Ajantha >>>>>>>>> >>> >>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>> >>>>>>>>> >>>> Hi Ajantha, >>>>>>>>> >>>> >>>>>>>>> >>>> I have left some comments. It is an interesting direction, >>>>>>>>> but there might be some details that need to be fine tuned. >>>>>>>>> >>>> >>>>>>>>> >>>> The doc is here [1] for others who might be interested. >>>>>>>>> Resharing since I do not think it was directly linked in the thread. >>>>>>>>> >>>> >>>>>>>>> >>>> [1] >>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>> >>>> >>>>>>>>> >>>> Thanks, >>>>>>>>> >>>> Walaa. >>>>>>>>> >>>> >>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> Hi, just another reminder since we didn't get any review on >>>>>>>>> the proposal. >>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>> >>>>> >>>>>>>>> >>>>> - Ajantha >>>>>>>>> >>>>> >>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Hi everyone, >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> We've only received one review so far (from Benny). >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> - Ajantha >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Hi All, >>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>> >>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Google doc link is attached in the proposal. >>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Hope it gives more clarity to take the decisions and how >>>>>>>>> we want to implement it. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> - Ajantha >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Thanks Jack. I actually meant scalar/aggregate/table user >>>>>>>>> defined functions. Here are some examples of what I meant in (2): >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Probably what you referred to is a variation of (1) where >>>>>>>>> the API is data flow/data pipeline API instead of SQL (e.g., Spark >>>>>>>>> Scala). >>>>>>>>> Yes, that is also possible in the very long run :) >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Thanks, >>>>>>>>> >>>>>>>> Walaa. >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative function >>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I think we could still explore some long term >>>>>>>>> opportunities in this case. Consider you register a Spark temp view >>>>>>>>> as some >>>>>>>>> sort of data frame read, then it could still be resolved to a Spark >>>>>>>>> plan >>>>>>>>> that is representable by an intermediate representation. But I agree >>>>>>>>> this >>>>>>>>> gets very complicated very soon, and just having the case (1) covered >>>>>>>>> would >>>>>>>>> already be a huge step forward. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -Jack >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny Chow < >>>>>>>>> btc...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular SQL UDF can be >>>>>>>>> used to build a parameterized view. So, there's definitely a lot in >>>>>>>>> common >>>>>>>>> between UDFs and views. >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa < >>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> I think there is a disconnect about what is perceived >>>>>>>>> as a "UDF". There are 2 flavors: >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the user whose >>>>>>>>> definition is a composition of other built-in functions/SQL >>>>>>>>> expressions. >>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative function >>>>>>>>> according to a Java/Scala/Python API, etc. >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's references are pretty >>>>>>>>> much from (1) and I think those have more analogy to views due to >>>>>>>>> their SQL >>>>>>>>> nature. Agree (2) is not practical to maintain by Iceberg, but I think >>>>>>>>> Ajantha's use cases are around (1), and may be worth evaluating. >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you post the proposal, >>>>>>>>> but I think this would be a very difficult area to tackle across >>>>>>>>> engines, >>>>>>>>> languages, and memory models without having a huge performance >>>>>>>>> penalty. >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports SQL >>>>>>>>> representations of UDFs (similar to views as shared by the reference >>>>>>>>> links >>>>>>>>> above), the complexity involved will be similar to managing views. >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for your input. >>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft spec (inspired >>>>>>>>> by the view spec) this week to facilitate further discussions. >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack Ye < >>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a common set of >>>>>>>>> functions across engines, I don't see how that is practical when those >>>>>>>>> engines are implemented so differently. Plugging in code -- and >>>>>>>>> especially >>>>>>>>> custom user-supplied code -- seems inherently specialized to me and >>>>>>>>> should >>>>>>>>> be part of the engines' design. >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> How is this different from the views? I feel we can >>>>>>>>> say exactly the same thing for Iceberg views, but yet we have Iceberg >>>>>>>>> multi-dialect views implemented. Maybe it sounds like we are trying >>>>>>>>> to draw >>>>>>>>> a line between SQL vs other programming language as "code"? but I >>>>>>>>> think SQL >>>>>>>>> is just another type of code, and we are already talking about >>>>>>>>> compiling >>>>>>>>> all these different code dialects to an intermediate representation >>>>>>>>> (using >>>>>>>>> projects like Coral, Substrait), which will be stored as another type >>>>>>>>> of >>>>>>>>> representation of Iceberg view. I think the same functionality can be >>>>>>>>> used >>>>>>>>> for UDFs if developed. >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support is a good idea, >>>>>>>>> even just a multi-dialect one like view, and that can allow engines >>>>>>>>> to for >>>>>>>>> example parse a view SQL, and when a function referenced cannot be >>>>>>>>> resolved, try to seek for a multi-dialect UDF definition. >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we have the actual >>>>>>>>> proposal published. >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM Robert Stupp < >>>>>>>>> sn...@snazy.de> wrote: >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and portable and >>>>>>>>> "non-centralized" as views are. The same performance concerns apply to >>>>>>>>> views as well. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base upon which >>>>>>>>> engines can build, so the argument that UDFs aren't practical, because >>>>>>>>> engines are different, is probably only a temporary concern. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should also try to tackle >>>>>>>>> the idea to make views portable, which is conceptually not that much >>>>>>>>> different from portable UDFs. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a negative touch to the >>>>>>>>> idea of having UDFs in Iceberg, especially not in this early stage. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a good idea to add >>>>>>>>> UDFs tracked by Iceberg catalogs. I think that Iceberg primarily >>>>>>>>> deals with >>>>>>>>> things that are centralized, like tables of data. While it would be >>>>>>>>> great >>>>>>>>> to have a common set of functions across engines, I don't see how >>>>>>>>> that is >>>>>>>>> practical when those engines are implemented so differently. Plugging >>>>>>>>> in >>>>>>>>> code -- and especially custom user-supplied code -- seems inherently >>>>>>>>> specialized to me and should be part of the engines' design. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you post the proposal, >>>>>>>>> but I think this would be a very difficult area to tackle across >>>>>>>>> engines, >>>>>>>>> languages, and memory models without having a huge performance >>>>>>>>> penalty. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the community >>>>>>>>> interest in storing the Versioned SQL UDFs in Iceberg. >>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec addition for storing >>>>>>>>> the versioned UDFs in Iceberg (inspired by view spec). >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly to views in that >>>>>>>>> they are associated with tables, but they can accept arguments and >>>>>>>>> produce >>>>>>>>> return values, or even function as inline expressions. >>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, Trino, Snowflake, >>>>>>>>> Databricks Spark supports SQL UDFs at catalog level [1]. >>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can enable >>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the engines. >>>>>>>>> Potentially engines can understand the UDFs written by other engines >>>>>>>>> (with >>>>>>>>> the translate layer). >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this feature into >>>>>>>>> Iceberg would be a valuable addition, and we're eager to collaborate >>>>>>>>> with >>>>>>>>> the community to develop a UDF specification. >>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting a specification >>>>>>>>> to propose to the community. >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> -- >>>>>>>>> >> Ryan Blue >>>>>>>>> >> Databricks >>>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Databricks >>>>> >>>>