Hi folks, thanks for attending today’s UDF sync. In general, we discussed the UDF metadata structure, captured at this doc( https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing ). Here is the detailed summary:
1. Each UDF overload has its own return type. e.g., `add(int, int)` returns `int`, while `add(long, long)` returns `long` 2. Return type should be explicitly specified, no implicit or statement-based return type inference should be allowed. 3. Adding explicit properties like deterministic, doc properties at the overload level. 4. Adding property “secure” at the top level. 5. Introducing a dedicated signature definitions section to centralize metadata (Function parameters, Return type, Parameter descriptions). Each overload would reference a signature definition by ID. This decoupling allows signature-related updates (like modifying parameter descriptions) without requiring a new UDF version, similar to how updating a table schema doesn’t create a new snapshot. 6. Whether to have versioned open properties or not. Versioned properties can lead to unnecessary copying of a bag of properties into each version, while it provides a clear history of properties for any future debugging and understanding of the UDF behavior at a specific point in time. Watch the recording here, https://www.youtube.com/watch?v=p7CvuGZKLSo&list=PLkifVhhWtccwzc3oRWjy5XiYJl0R6kdQL Yufei On Thu, Aug 21, 2025 at 4:18 PM Yufei Gu <flyrain...@gmail.com> wrote: > Hi everyone, here’s the summary from our last sync on 8/11. Apologies for > the delay! > > - One UDF entity for all overloads > - We agreed to combine overloads with the same name into a single > UDF entity, which shares a common metadata.json file. > - Listing UDFs will return a list of UDF names, not a list of > individual signatures. > - Loading a UDF by name will return all of its overloads. > - Versioning Strategy > - A global version number will track changes across the entire UDF > entity, it increments monolithically. > - Each overload will also maintain its own version (e.g., > updated_at_version) to trace changes specific to that overload. > - For simplicity, the load API will not support argument-based > filtering in the initial release. It will always return all overloads for a > given UDF name, overload-level loading is not supported at this stage. > > Watch the recording here, > https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view > > Yufei > > > On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> To recap and add my thoughts, we want to support UDFs with multiple >> signatures under the same name, which can serve both overload-aware and >> overload-naive engines. >> >> Per my investigation[1], most engines support overloading by arguments >> and allow implicit conversions like numeric widening (e.g., INT → >> BIGINT/FLOAT). The resolution approach causes issues like silent behavior >> change. Here is an example: >> >> - Initially, only foo(DOUBLE) exists. >> - foo(42::INT) widens INT → DOUBLE and runs expected code. >> - Later: malicious user creates foo(BIGINT). >> - Engine’s best-match resolution now binds the same call to the new >> overload, changing behavior without modifying the query. >> >> To mitigate this issue, we have to choose between these two access >> control models: >> >> 1. Model A – Name-Level ACL: Grants apply to all overloads of a >> function name. >> 2. Model B – Signature-Level ACL: Grants tied to specific signatures. >> >> The general recommendation is to adopt *Model A.* It trades some >> precision for safety and simplicity, while eliminating the silent behavior >> change problem. More details are in this doc[1]. >> >> 1. >> https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0 >> >> Yufei >> >> >> On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> Thanks to everyone who joined the sync. >>> Here is the meeting recording: >>> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view >>> >>> Summary: >>> We have discussed how to define function identifiers (should also handle >>> function overloading). Ryan suggested that we should check how Spark does >>> it. We can refer to functions using an identifier and then bind the >>> different signatures to it. So that access policies can be applied per >>> identifier. This is also linked to how we want to version the functions >>> when overloading is supported. >>> >>> I will check more about this and update the proposal doc. >>> >>> Please check/subscribe to the dev events calendar for the next >>> meeting link (Aug 11). >>> >>> - Ajantha >>> >>> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <kevinjq...@apache.org> >>> wrote: >>> >>>> Hi Ajantha, >>>> >>>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events" >>>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but i'll >>>> be at this one. >>>> >>>> Best, >>>> Kevin Liu >>>> >>>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <ajanthab...@gmail.com> >>>> wrote: >>>> >>>>> Hey everyone, >>>>> >>>>> No one joined the sync today. I came to know that Yufei is on holiday, >>>>> and Ryan and others couldn't make it, similar to the last sync. It seems >>>>> Yufei might have forgotten to transfer meeting ownership as well, as new >>>>> members needed admin approval and couldn't join automatically this week. >>>>> Also, I can understand it is summer holiday season for many. >>>>> >>>>> I've updated the function signature schema and other open points. I >>>>> believe we're very close to the final version of the spec. A meeting is >>>>> indeed necessary to finalize this, but we don't have to wait for it to >>>>> finish the review process. We had many meetings on this in the past >>>>> already. So, please review the document at your earliest convenience. If >>>>> we >>>>> agree on the spec by next week, I can raise a PR. >>>>> >>>>> - Ajantha >>>>> >>>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <flyrain...@gmail.com> wrote: >>>>> >>>>>> I’d propose to move the field `properties` from a top level field to >>>>>> a field inside “version” along with a representation, so that properties >>>>>> are versioned. A property like “deterministic” could change along with >>>>>> representation over time. For example, we need to change “deterministic” >>>>>> from true to false in case of adding a non-deterministic SQL >>>>>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback won't >>>>>> be safe. >>>>>> >>>>>> That said, it's still an open question whether we need any >>>>>> non-versioned properties. We can introduce them later if a use case >>>>>> arises. >>>>>> >>>>>> Yufei >>>>>> >>>>>> >>>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com> wrote: >>>>>> >>>>>>> Thanks for the summary, Ajantha! >>>>>>> >>>>>>> I’d prefer to keep the signature list separate from the >>>>>>> representation history. Here are reasons: >>>>>>> >>>>>>> 1. Each version still enforces a single signature. Although the >>>>>>> signatures array is global to the UDF, each version references just >>>>>>> one >>>>>>> signature ID. Rollbacks to historical versions remain safe. >>>>>>> 2. We’ve separated the less frequently changing component >>>>>>> (signatures) from the more dynamic one (representations) to reduce >>>>>>> metadata >>>>>>> file size. >>>>>>> 3. Since signatures use Iceberg data types, they should remain >>>>>>> unaffected by multi-dialect representation differences. >>>>>>> >>>>>>> Yufei >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <ajanthab...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks to everyone who joined the sync. >>>>>>>> Here is the meeting recording: >>>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing >>>>>>>> >>>>>>>> Summary: >>>>>>>> We have discussed the action items from the last sync (*see >>>>>>>> Appendix C* in the proposal doc) >>>>>>>> >>>>>>>> - Function overloading: Supported by few of the engines and in >>>>>>>> the roadmaps of many engines. Iceberg will support it. We will >>>>>>>> maintain the >>>>>>>> `FunctionIdentifier` (extends `TableIdentifer` but also have a >>>>>>>> member >>>>>>>> containing the function argument's type list). And all operations >>>>>>>> like >>>>>>>> load, rename, list, create and drop are based on >>>>>>>> `FunctionIdentifier`. >>>>>>>> - Secure UDF: If we store it as a property in a bag, we need to >>>>>>>> standardize the property name. Iceberg encryption may be orthogonal >>>>>>>> to this >>>>>>>> discussion. >>>>>>>> - UDF with multi statement and procedural bodies are supported >>>>>>>> by some engines. Iceberg will support it. Store the body as it is >>>>>>>> while >>>>>>>> creating function by the engine. >>>>>>>> >>>>>>>> new discussions around >>>>>>>> >>>>>>>> - Standardizing the property names (deterministic, secure). >>>>>>>> - About the rename function. >>>>>>>> - Replace function. To check upto what level replace is >>>>>>>> supported (considering function overloading) . >>>>>>>> - Signature should be associated with representation? >>>>>>>> >>>>>>>> I think we are close on the spec. Please review the proposal >>>>>>>> >>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing> >>>>>>>> . >>>>>>>> >>>>>>>> Details for next Iceberg UDF sync: >>>>>>>> >>>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>>> Google Meet joining info >>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>> >>>>>>>> - Ajantha >>>>>>>> >>>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Can it be handled by Iceberg encryption? If the whole metadata is >>>>>>>>> encrypted, we don't have to worry about just hiding the UDF body? Let >>>>>>>>> us >>>>>>>>> discuss more on the sync today. >>>>>>>>> >>>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Yes, hiding the definition and disabling pushdown are required.We >>>>>>>>>> will need a named key(e.g., secure) somewhere, no matter if it is a >>>>>>>>>> top >>>>>>>>>> level property or a key as a part of the UDF properties. So that >>>>>>>>>> both UDF >>>>>>>>>> creator and consumer can recognize it. >>>>>>>>>> >>>>>>>>>> Yufei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for the extra detail. What do you think the spec would >>>>>>>>>>> require? Would it require hiding the UDF definition from users and >>>>>>>>>>> require >>>>>>>>>>> specific pushdown cases be disabled? The use cases seem valid, but >>>>>>>>>>> I'm >>>>>>>>>>> trying to understand the requirements this places on engines and >>>>>>>>>>> why it >>>>>>>>>>> needs to be part of the spec, rather than part of the properties of >>>>>>>>>>> the UDF. >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ryan, >>>>>>>>>>>> >>>>>>>>>>>> Here are the main use cases for secure UDFs: >>>>>>>>>>>> >>>>>>>>>>>> 1. >>>>>>>>>>>> >>>>>>>>>>>> Hiding UDF Definitions: This includes concealing the UDF >>>>>>>>>>>> body and details like the list of imports, some of them aren’t >>>>>>>>>>>> applicable >>>>>>>>>>>> to SQL UDFs. >>>>>>>>>>>> 2. >>>>>>>>>>>> >>>>>>>>>>>> Sandboxed Execution: Ensuring the UDF runs in an isolated >>>>>>>>>>>> environment. Again, this typically doesn’t apply to SQL UDFs. >>>>>>>>>>>> 3. >>>>>>>>>>>> >>>>>>>>>>>> Preventing Data Leakage at Execution Time: For example, >>>>>>>>>>>> secure UDFs may disable certain optimizations—such as predicate >>>>>>>>>>>> pushdown—to >>>>>>>>>>>> avoid exposing sensitive data indirectly. [1] >>>>>>>>>>>> >>>>>>>>>>>> Given these scenarios, I agree with your point that the secure >>>>>>>>>>>> flag is primarily an instruction to the engine to behave >>>>>>>>>>>> differently. While >>>>>>>>>>>> it's largely an engine-side behavior, we still need to include >>>>>>>>>>>> this flag in >>>>>>>>>>>> the UDF definition to indicate whether a UDF is secure, especially >>>>>>>>>>>> considering the perf penalty introduced by scenario #3. We should >>>>>>>>>>>> clearly >>>>>>>>>>>> recommend that users avoid marking UDFs as secure unless it's truly >>>>>>>>>>>> necessary. >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown >>>>>>>>>>>> Yufei >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Yufei, could you make the argument for supporting a "secure" >>>>>>>>>>>>> UDF? What use case are you addressing and what specifically >>>>>>>>>>>>> changes about >>>>>>>>>>>>> how the UDF is handled? If the idea is to hide the UDF >>>>>>>>>>>>> definition, do we >>>>>>>>>>>>> need to include it? >>>>>>>>>>>>> >>>>>>>>>>>>> I think this would be a signal to a "trusted engine". When the >>>>>>>>>>>>> engine interacts with the catalog it sends authorization >>>>>>>>>>>>> information about >>>>>>>>>>>>> itself in addition to the user that it is acting on behalf of. >>>>>>>>>>>>> That way the >>>>>>>>>>>>> catalog knows that the secure UDF can be sent to the engine and >>>>>>>>>>>>> won't be >>>>>>>>>>>>> shown to the user. The majority of this logic is on the REST >>>>>>>>>>>>> server side, >>>>>>>>>>>>> and the only part that is communicated to the client is the >>>>>>>>>>>>> request not to >>>>>>>>>>>>> show the UDF to the user, right? In that case should this be a >>>>>>>>>>>>> property >>>>>>>>>>>>> rather than part of the definition? Even if we state that the >>>>>>>>>>>>> client "must" >>>>>>>>>>>>> suppress the UDF definition, it's really just a request. Only >>>>>>>>>>>>> trusted >>>>>>>>>>>>> engines can be passed the UDF definition, so a spec requirement >>>>>>>>>>>>> to suppress >>>>>>>>>>>>> the definition isn't very meaningful. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the summary, Ajantha! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether those >>>>>>>>>>>>>> statements run within a single transaction should be treated as >>>>>>>>>>>>>> an >>>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the >>>>>>>>>>>>>> expectation, >>>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even if a >>>>>>>>>>>>>> UDF >>>>>>>>>>>>>> declares itself transactional, the engine may or may not enforce >>>>>>>>>>>>>> it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF” >>>>>>>>>>>>>> option supported by some engines[1], so the body and any >>>>>>>>>>>>>> sensitive details >>>>>>>>>>>>>> stay hidden from callers? >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yufei >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat < >>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>>>>>>> Here is the meeting recording: >>>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing >>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - We have gone through the SQL UDF syntax supported by >>>>>>>>>>>>>>> different engines (Snowflake, databricks, Dremio, Trino, OSS >>>>>>>>>>>>>>> spark 4.0). >>>>>>>>>>>>>>> - Each engine uses its own block separator, like $$ or >>>>>>>>>>>>>>> '' or none. Action item was to check whether engines support >>>>>>>>>>>>>>> multi-statement (transactional) UDF bodies. >>>>>>>>>>>>>>> - Discussed about function overloading. Need to check >>>>>>>>>>>>>>> whether these engines support function overloading for SQL >>>>>>>>>>>>>>> UDFs. Postgres >>>>>>>>>>>>>>> supports it! If yes, need to adopt the spec to handle it. >>>>>>>>>>>>>>> - Started online spec review and discussed the >>>>>>>>>>>>>>> deterministic flag and concluded that we keep the >>>>>>>>>>>>>>> independent fields (like >>>>>>>>>>>>>>> deterministic) in spec only if the majority of engines >>>>>>>>>>>>>>> supports it. Else it >>>>>>>>>>>>>>> will be passed in a property bag (engine specific). And it >>>>>>>>>>>>>>> is the engine's >>>>>>>>>>>>>>> responsibility to honor those optional properties. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Feel free to review the current proposal document here >>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: >>>>>>>>>>>>>>> America/Los_Angeles >>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat < >>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>>>>>>>> Here is the meeting recording: >>>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We discussed including Python support; the majority >>>>>>>>>>>>>>>> agreed *not to* (see recording for details). >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No strong opposition to versioning — it will be >>>>>>>>>>>>>>>> included to support change tracking and similar use cases. >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Suggestions were made to document how each catalog >>>>>>>>>>>>>>>> resolves UDFs, similar to views and tables. >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We agreed not to deviate from the existing table/view >>>>>>>>>>>>>>>> spec — e.g., location will remain *required* for >>>>>>>>>>>>>>>> cross-catalog compatibility. >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We also discussed a bit about view interoperability as >>>>>>>>>>>>>>>> the same things are applicable here. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Feel free to review the proposal document >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> >>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>> With the current scope, it is similar to the view/table >>>>>>>>>>>>>>>> spec now. >>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is >>>>>>>>>>>>>>>> ready. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone: >>>>>>>>>>>>>>>> America/Los_Angeles >>>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu < >>>>>>>>>>>>>>>> flyrain...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi folks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the >>>>>>>>>>>>>>>>> UDF project. Everyone’s welcome to drop in and share ideas! >>>>>>>>>>>>>>>>> Here is the >>>>>>>>>>>>>>>>> meeting link: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Iceberg UDF sync >>>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am >>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles >>>>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yufei >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat < >>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Update on the progress. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss >>>>>>>>>>>>>>>>>> the UDF proposal. We covered several key points, though some >>>>>>>>>>>>>>>>>> are still open >>>>>>>>>>>>>>>>>> for further discussion: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for >>>>>>>>>>>>>>>>>> UDFs at this stage? We explored the possibility of >>>>>>>>>>>>>>>>>> simplifying the >>>>>>>>>>>>>>>>>> specification by avoiding view replication, and potentially >>>>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>>>> versioning support later. UDTFs, being a superset of views >>>>>>>>>>>>>>>>>> in some ways, >>>>>>>>>>>>>>>>>> may not require versioning initially. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not >>>>>>>>>>>>>>>>>> support vararg syntax in CREATE FUNCTION, Iceberg UDFs >>>>>>>>>>>>>>>>>> could represent such arguments as lists when supported by >>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t >>>>>>>>>>>>>>>>>> support generic types (e.g., object), we can only map >>>>>>>>>>>>>>>>>> engine-specific types to Iceberg types. As a result, generic >>>>>>>>>>>>>>>>>> data types >>>>>>>>>>>>>>>>>> will not be supported in the initial version. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language >>>>>>>>>>>>>>>>>> for SQL UDFs seems promising, especially given its potential >>>>>>>>>>>>>>>>>> to resolve >>>>>>>>>>>>>>>>>> interoperability challenges. Some engines, however, require >>>>>>>>>>>>>>>>>> platform >>>>>>>>>>>>>>>>>> version and package dependency details to execute Python >>>>>>>>>>>>>>>>>> code—this should >>>>>>>>>>>>>>>>>> be captured in the specification. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *Next Steps* >>>>>>>>>>>>>>>>>> I will update the proposal document with two primary UDF >>>>>>>>>>>>>>>>>> use cases: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Policy exchange between engines >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> UDTF as a superset of view functionality >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The update will include corresponding syntax examples in >>>>>>>>>>>>>>>>>> both SQL and Python, and detail how each use case is >>>>>>>>>>>>>>>>>> represented in Iceberg >>>>>>>>>>>>>>>>>> metadata. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more >>>>>>>>>>>>>>>>>> interested participants) to continue refining and finalizing >>>>>>>>>>>>>>>>>> the UDF >>>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the >>>>>>>>>>>>>>>>>>> previous comments. Additionally, I've included the SQL UDF >>>>>>>>>>>>>>>>>>> syntax supported >>>>>>>>>>>>>>>>>>> by various vendors, including Dremio, Snowflake, >>>>>>>>>>>>>>>>>>> Databricks, and Trino. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper >>>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, especially >>>>>>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>>>>>> renewed interest from the community. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was >>>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and adding >>>>>>>>>>>>>>>>>>>> endpoints for >>>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better >>>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for >>>>>>>>>>>>>>>>>>>> fine-grained access >>>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond just >>>>>>>>>>>>>>>>>>>> versioned and >>>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many >>>>>>>>>>>>>>>>>>>> vendors are interested >>>>>>>>>>>>>>>>>>>> in this feature. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d >>>>>>>>>>>>>>>>>>>> like to take ownership of this effort and revive the work. >>>>>>>>>>>>>>>>>>>> I'll be >>>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and will >>>>>>>>>>>>>>>>>>>> share an updated >>>>>>>>>>>>>>>>>>>> proposal by next week. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg! >>>>>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be >>>>>>>>>>>>>>>>>>>>> SQL. It merely does not specify (in this revision) how >>>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>> representations are to be written. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type >>>>>>>>>>>>>>>>>>>>> in the "Representations" section). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue >>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the >>>>>>>>>>>>>>>>>>>>>> spec. It leaves a way for future versions to add >>>>>>>>>>>>>>>>>>>>>> different representations >>>>>>>>>>>>>>>>>>>>>> later, but only SQL is supported. That was also the >>>>>>>>>>>>>>>>>>>>>> feedback to my initial >>>>>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL >>>>>>>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring >>>>>>>>>>>>>>>>>>>>>>> SQL in examples... It >>>>>>>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong < >>>>>>>>>>>>>>>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this >>>>>>>>>>>>>>>>>>>>>>>> proposal focuses on SQL-based engines, while >>>>>>>>>>>>>>>>>>>>>>>> Python-based systems often >>>>>>>>>>>>>>>>>>>>>>>> work with data frames. Adding imperative languages >>>>>>>>>>>>>>>>>>>>>>>> like Python would make >>>>>>>>>>>>>>>>>>>>>>>> this proposal more inclusive. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before in this thread >>>>>>>>>>>>>>>>>>>>>>>>> [1] i read >>>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to >>>>>>>>>>>>>>>>>>>>>>>>> share among different engines." >>>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully >>>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design addresses >>>>>>>>>>>>>>>>>>>>>>>>> shareability between the >>>>>>>>>>>>>>>>>>>>>>>>> engines though. >>>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>>>>>>>>>>>> Piotr >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created >>>>>>>>>>>>>>>>>>>>>>>>>> functions shareable >>>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in >>>>>>>>>>>>>>>>>>>>>>>>>> imperative code? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The >>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea! >>>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created >>>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines? >>>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement >>>>>>>>>>>>>>>>>>>>>>>>>> look like in e..g Spark or Trino? >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > Best >>>>>>>>>>>>>>>>>>>>>>>>>> > Piotr >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added >>>>>>>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a >>>>>>>>>>>>>>>>>>>>>>>>>> design doc that covers >>>>>>>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the >>>>>>>>>>>>>>>>>>>>>>>>>> choice to enumerate all >>>>>>>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing >>>>>>>>>>>>>>>>>>>>>>>>>> generics and varargs. >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function >>>>>>>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a >>>>>>>>>>>>>>>>>>>>>>>>>> discussion of the >>>>>>>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think >>>>>>>>>>>>>>>>>>>>>>>>>> it would also be very >>>>>>>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for >>>>>>>>>>>>>>>>>>>>>>>>>> this included in the >>>>>>>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it >>>>>>>>>>>>>>>>>>>>>>>>>> solves those use cases >>>>>>>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs. >>>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is >>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For >>>>>>>>>>>>>>>>>>>>>>>>>> example, using string IDs >>>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer. >>>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec >>>>>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. It >>>>>>>>>>>>>>>>>>>>>>>>>> requires readers to >>>>>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which prevents >>>>>>>>>>>>>>>>>>>>>>>>>> the spec from adding >>>>>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other >>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg specs only require >>>>>>>>>>>>>>>>>>>>>>>>>> a version change to introduce forward-incompatible >>>>>>>>>>>>>>>>>>>>>>>>>> changes and I think that >>>>>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion. >>>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple >>>>>>>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear >>>>>>>>>>>>>>>>>>>>>>>>>> how to encode them >>>>>>>>>>>>>>>>>>>>>>>>>> because a version is associated with a single >>>>>>>>>>>>>>>>>>>>>>>>>> function signature. >>>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating >>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that >>>>>>>>>>>>>>>>>>>>>>>>>> the metadata proposed >>>>>>>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases. >>>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows >>>>>>>>>>>>>>>>>>>>>>>>>> a SELECT statement and it isn’t clear how this is >>>>>>>>>>>>>>>>>>>>>>>>>> distinct from a view >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on >>>>>>>>>>>>>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review >>>>>>>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next >>>>>>>>>>>>>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a >>>>>>>>>>>>>>>>>>>>>>>>>> look at the proposal >>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin >>>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an >>>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some >>>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine >>>>>>>>>>>>>>>>>>>>>>>>>> tuned. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be >>>>>>>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was >>>>>>>>>>>>>>>>>>>>>>>>>> directly linked in the >>>>>>>>>>>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't >>>>>>>>>>>>>>>>>>>>>>>>>> get any review on the proposal. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far >>>>>>>>>>>>>>>>>>>>>>>>>> (from Benny). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the >>>>>>>>>>>>>>>>>>>>>>>>>> proposal. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the >>>>>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa >>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant >>>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here >>>>>>>>>>>>>>>>>>>>>>>>>> are some examples of >>>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2): >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a >>>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data flow/data >>>>>>>>>>>>>>>>>>>>>>>>>> pipeline API instead of >>>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also possible >>>>>>>>>>>>>>>>>>>>>>>>>> in the very long run :) >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some >>>>>>>>>>>>>>>>>>>>>>>>>> long term opportunities in this case. Consider you >>>>>>>>>>>>>>>>>>>>>>>>>> register a Spark temp >>>>>>>>>>>>>>>>>>>>>>>>>> view as some sort of data frame read, then it could >>>>>>>>>>>>>>>>>>>>>>>>>> still be resolved to a >>>>>>>>>>>>>>>>>>>>>>>>>> Spark plan that is representable by an intermediate >>>>>>>>>>>>>>>>>>>>>>>>>> representation. But I >>>>>>>>>>>>>>>>>>>>>>>>>> agree this gets very complicated very soon, and just >>>>>>>>>>>>>>>>>>>>>>>>>> having the case (1) >>>>>>>>>>>>>>>>>>>>>>>>>> covered would already be a huge step forward. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny >>>>>>>>>>>>>>>>>>>>>>>>>> Chow <btc...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a >>>>>>>>>>>>>>>>>>>>>>>>>> tabular SQL UDF can be used to build a parameterized >>>>>>>>>>>>>>>>>>>>>>>>>> view. So, there's >>>>>>>>>>>>>>>>>>>>>>>>>> definitely a lot in common between UDFs and views. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa >>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about >>>>>>>>>>>>>>>>>>>>>>>>>> what is perceived as a "UDF". There are 2 flavors: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the >>>>>>>>>>>>>>>>>>>>>>>>>> user whose definition is a composition of other >>>>>>>>>>>>>>>>>>>>>>>>>> built-in functions/SQL >>>>>>>>>>>>>>>>>>>>>>>>>> expressions. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's >>>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I think >>>>>>>>>>>>>>>>>>>>>>>>>> those have more analogy to >>>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is not >>>>>>>>>>>>>>>>>>>>>>>>>> practical to maintain by >>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are around >>>>>>>>>>>>>>>>>>>>>>>>>> (1), and may be worth >>>>>>>>>>>>>>>>>>>>>>>>>> evaluating. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM >>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you >>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very >>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to >>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models >>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge >>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports >>>>>>>>>>>>>>>>>>>>>>>>>> SQL representations of UDFs (similar to views as >>>>>>>>>>>>>>>>>>>>>>>>>> shared by the reference >>>>>>>>>>>>>>>>>>>>>>>>>> links above), the complexity involved will be >>>>>>>>>>>>>>>>>>>>>>>>>> similar to managing views. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for >>>>>>>>>>>>>>>>>>>>>>>>>> your input. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft >>>>>>>>>>>>>>>>>>>>>>>>>> spec (inspired by the view spec) this week to >>>>>>>>>>>>>>>>>>>>>>>>>> facilitate further >>>>>>>>>>>>>>>>>>>>>>>>>> discussions. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack >>>>>>>>>>>>>>>>>>>>>>>>>> Ye <yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a >>>>>>>>>>>>>>>>>>>>>>>>>> common set of functions across engines, I don't see >>>>>>>>>>>>>>>>>>>>>>>>>> how that is practical >>>>>>>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. >>>>>>>>>>>>>>>>>>>>>>>>>> Plugging in code -- and >>>>>>>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems >>>>>>>>>>>>>>>>>>>>>>>>>> inherently specialized to me >>>>>>>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the >>>>>>>>>>>>>>>>>>>>>>>>>> views? I feel we can say exactly the same thing for >>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg views, but yet >>>>>>>>>>>>>>>>>>>>>>>>>> we have Iceberg multi-dialect views implemented. >>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it sounds like we >>>>>>>>>>>>>>>>>>>>>>>>>> are trying to draw a line between SQL vs other >>>>>>>>>>>>>>>>>>>>>>>>>> programming language as >>>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type of >>>>>>>>>>>>>>>>>>>>>>>>>> code, and we are already >>>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different code >>>>>>>>>>>>>>>>>>>>>>>>>> dialects to an >>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects like >>>>>>>>>>>>>>>>>>>>>>>>>> Coral, Substrait), which >>>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of representation of >>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg view. I think >>>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if >>>>>>>>>>>>>>>>>>>>>>>>>> developed. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support >>>>>>>>>>>>>>>>>>>>>>>>>> is a good idea, even just a multi-dialect one like >>>>>>>>>>>>>>>>>>>>>>>>>> view, and that can allow >>>>>>>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a >>>>>>>>>>>>>>>>>>>>>>>>>> function referenced >>>>>>>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect >>>>>>>>>>>>>>>>>>>>>>>>>> UDF definition. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we >>>>>>>>>>>>>>>>>>>>>>>>>> have the actual proposal published. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM >>>>>>>>>>>>>>>>>>>>>>>>>> Robert Stupp <sn...@snazy.de> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and >>>>>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The >>>>>>>>>>>>>>>>>>>>>>>>>> same performance concerns >>>>>>>>>>>>>>>>>>>>>>>>>> apply to views as well. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common >>>>>>>>>>>>>>>>>>>>>>>>>> base upon which engines can build, so the argument >>>>>>>>>>>>>>>>>>>>>>>>>> that UDFs aren't >>>>>>>>>>>>>>>>>>>>>>>>>> practical, because engines are different, is >>>>>>>>>>>>>>>>>>>>>>>>>> probably only a temporary >>>>>>>>>>>>>>>>>>>>>>>>>> concern. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should >>>>>>>>>>>>>>>>>>>>>>>>>> also try to tackle the idea to make views portable, >>>>>>>>>>>>>>>>>>>>>>>>>> which is conceptually >>>>>>>>>>>>>>>>>>>>>>>>>> not that much different from portable UDFs. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a >>>>>>>>>>>>>>>>>>>>>>>>>> negative touch to the idea of having UDFs in >>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, especially not in >>>>>>>>>>>>>>>>>>>>>>>>>> this early stage. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a >>>>>>>>>>>>>>>>>>>>>>>>>> good idea to add UDFs tracked by Iceberg catalogs. I >>>>>>>>>>>>>>>>>>>>>>>>>> think that Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, >>>>>>>>>>>>>>>>>>>>>>>>>> like tables of data. >>>>>>>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of >>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, I >>>>>>>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines >>>>>>>>>>>>>>>>>>>>>>>>>> are implemented so >>>>>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially >>>>>>>>>>>>>>>>>>>>>>>>>> custom user-supplied code >>>>>>>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be >>>>>>>>>>>>>>>>>>>>>>>>>> part of the engines' >>>>>>>>>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you >>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very >>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to >>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models >>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge >>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM >>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the >>>>>>>>>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs >>>>>>>>>>>>>>>>>>>>>>>>>> in Iceberg. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec >>>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>> (inspired by view spec). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly >>>>>>>>>>>>>>>>>>>>>>>>>> to views in that they are associated with tables, >>>>>>>>>>>>>>>>>>>>>>>>>> but they can accept >>>>>>>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even >>>>>>>>>>>>>>>>>>>>>>>>>> function as inline expressions. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, >>>>>>>>>>>>>>>>>>>>>>>>>> Trino, Snowflake, Databricks Spark supports SQL UDFs >>>>>>>>>>>>>>>>>>>>>>>>>> at catalog level [1]. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can >>>>>>>>>>>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the >>>>>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the UDFs >>>>>>>>>>>>>>>>>>>>>>>>>> written by other >>>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this >>>>>>>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, >>>>>>>>>>>>>>>>>>>>>>>>>> and we're eager to >>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF >>>>>>>>>>>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun >>>>>>>>>>>>>>>>>>>>>>>>>> drafting a specification to propose to the community. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> -- >>>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>>>> Databricks >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>