Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

Yufei Gu Mon, 25 Aug 2025 18:37:23 -0700

Hi folks, thanks for attending today’s UDF sync. In general, we discussed
the UDF metadata structure, captured at this doc(
https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
). Here is the detailed summary:


   1. Each UDF overload has its own return type. e.g., `add(int, int)`
   returns `int`, while `add(long, long)`  returns `long`
   2. Return type should be explicitly specified, no implicit or
   statement-based return type inference should be allowed.
   3. Adding explicit properties like deterministic, doc properties at the
   overload level.
   4. Adding property “secure” at the top level.
   5. Introducing a dedicated signature definitions section to centralize
   metadata (Function parameters, Return type, Parameter descriptions). Each
   overload would reference a signature definition by ID. This decoupling
   allows signature-related updates (like modifying parameter descriptions)
   without requiring a new UDF version, similar to how updating a table schema
   doesn’t create a new snapshot.
   6. Whether to have versioned open properties or not. Versioned
   properties can lead to unnecessary copying of a bag of properties into each
   version, while it provides a clear history of properties for any future
   debugging and understanding of the UDF behavior at a specific point in
   time.

Watch the recording here,
https://www.youtube.com/watch?v=p7CvuGZKLSo&list=PLkifVhhWtccwzc3oRWjy5XiYJl0R6kdQL

Yufei


On Thu, Aug 21, 2025 at 4:18 PM Yufei Gu <[email protected]> wrote:

> Hi everyone, here’s the summary from our last sync on 8/11. Apologies for
> the delay!
>
>    - One UDF entity for all overloads
>       - We agreed to combine overloads with the same name into a single
>       UDF entity, which shares a common metadata.json file.
>       - Listing UDFs will return a list of UDF names, not a list of
>       individual signatures.
>       - Loading a UDF by name will return all of its overloads.
>    - Versioning Strategy
>       - A global version number will track changes across the entire UDF
>       entity, it increments monolithically.
>       - Each overload will also maintain its own version (e.g.,
>       updated_at_version) to trace changes specific to that overload.
>    - For simplicity, the load API will not support argument-based
>    filtering in the initial release. It will always return all overloads for a
>    given UDF name, overload-level loading is not supported at this stage.
>
> Watch the recording here,
> https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view
>
> Yufei
>
>
> On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <[email protected]> wrote:
>
>> To recap and add my thoughts, we want to support UDFs with multiple
>> signatures under the same name, which can serve both overload-aware and
>> overload-naive engines.
>>
>> Per my investigation[1], most engines support overloading by arguments
>> and allow implicit conversions like numeric widening (e.g., INT →
>> BIGINT/FLOAT). The resolution approach causes issues like silent behavior
>> change. Here is an example:
>>
>>    - Initially, only foo(DOUBLE) exists.
>>    - foo(42::INT) widens INT → DOUBLE and runs expected code.
>>    - Later: malicious user creates foo(BIGINT).
>>    - Engine’s best-match resolution now binds the same call to the new
>>    overload, changing behavior without modifying the query.
>>
>> To mitigate this issue, we have to choose between these two access
>> control models:
>>
>>    1. Model A – Name-Level ACL: Grants apply to all overloads of a
>>    function name.
>>    2. Model B – Signature-Level ACL: Grants tied to specific signatures.
>>
>> The general recommendation is to adopt *Model A.* It trades some
>> precision for safety and simplicity, while eliminating the silent behavior
>> change problem. More details are in this doc[1].
>>
>> 1.
>> https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0
>>
>> Yufei
>>
>>
>> On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <[email protected]>
>> wrote:
>>
>>> Thanks to everyone who joined the sync.
>>> Here is the meeting recording:
>>> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view
>>>
>>> Summary:
>>> We have discussed how to define function identifiers (should also handle
>>> function overloading). Ryan suggested that we should check how Spark does
>>> it. We can refer to functions using an identifier and then bind the
>>> different signatures to it. So that access policies can be applied per
>>> identifier. This is also linked to how we want to version the functions
>>> when overloading is supported.
>>>
>>> I will check more about this and update the proposal doc.
>>>
>>> Please check/subscribe to the dev events calendar for the next
>>> meeting link (Aug 11).
>>>
>>> - Ajantha
>>>
>>> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <[email protected]>
>>> wrote:
>>>
>>>> Hi Ajantha,
>>>>
>>>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events"
>>>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but i'll
>>>> be at this one.
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey everyone,
>>>>>
>>>>> No one joined the sync today. I came to know that Yufei is on holiday,
>>>>> and Ryan and others couldn't make it, similar to the last sync. It seems
>>>>> Yufei might have forgotten to transfer meeting ownership as well, as new
>>>>> members needed admin approval and couldn't join automatically this week.
>>>>> Also, I can understand it is summer holiday season for many.
>>>>>
>>>>> I've updated the function signature schema and other open points. I
>>>>> believe we're very close to the final version of the spec. A meeting is
>>>>> indeed necessary to finalize this, but we don't have to wait for it to
>>>>> finish the review process. We had many meetings on this in the past
>>>>> already. So, please review the document at your earliest convenience. If 
>>>>> we
>>>>> agree on the spec by next week, I can raise a PR.
>>>>>
>>>>> - Ajantha
>>>>>
>>>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <[email protected]> wrote:
>>>>>
>>>>>> I’d propose to move the field `properties` from a top level field to
>>>>>> a field inside “version” along with a representation, so that properties
>>>>>> are versioned. A property like “deterministic” could change along with
>>>>>> representation over time. For example, we need to change “deterministic”
>>>>>> from true to false in case of adding a non-deterministic SQL
>>>>>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback won't
>>>>>> be safe.
>>>>>>
>>>>>> That said, it's still an open question whether we need any
>>>>>> non-versioned properties. We can introduce them later if a use case 
>>>>>> arises.
>>>>>>
>>>>>> Yufei
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>
>>>>>>> I’d prefer to keep the signature list separate from the
>>>>>>> representation history. Here are reasons:
>>>>>>>
>>>>>>>    1. Each version still enforces a single signature. Although the
>>>>>>>    signatures array is global to the UDF, each version references just 
>>>>>>> one
>>>>>>>    signature ID. Rollbacks to historical versions remain safe.
>>>>>>>    2. We’ve separated the less frequently changing component
>>>>>>>    (signatures) from the more dynamic one (representations) to reduce 
>>>>>>> metadata
>>>>>>>    file size.
>>>>>>>    3. Since signatures use Iceberg data types, they should remain
>>>>>>>    unaffected by multi-dialect representation differences.
>>>>>>>
>>>>>>> Yufei
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>> Here is the meeting recording:
>>>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing
>>>>>>>>
>>>>>>>> Summary:
>>>>>>>> We have discussed the action items from the last sync (*see
>>>>>>>> Appendix C* in the proposal doc)
>>>>>>>>
>>>>>>>>    - Function overloading: Supported by few of the engines and in
>>>>>>>>    the roadmaps of many engines. Iceberg will support it. We will 
>>>>>>>> maintain the
>>>>>>>>    `FunctionIdentifier` (extends `TableIdentifer` but also have a 
>>>>>>>> member
>>>>>>>>    containing the function argument's type list). And all operations 
>>>>>>>> like
>>>>>>>>    load, rename, list, create and drop are based on 
>>>>>>>> `FunctionIdentifier`.
>>>>>>>>    - Secure UDF: If we store it as a property in a bag, we need to
>>>>>>>>    standardize the property name. Iceberg encryption may be orthogonal 
>>>>>>>> to this
>>>>>>>>    discussion.
>>>>>>>>    - UDF with multi statement and procedural bodies are supported
>>>>>>>>    by some engines. Iceberg will support it. Store the body as it is 
>>>>>>>> while
>>>>>>>>    creating function by the engine.
>>>>>>>>
>>>>>>>> new discussions around
>>>>>>>>
>>>>>>>>    - Standardizing the property names (deterministic, secure).
>>>>>>>>    - About the rename function.
>>>>>>>>    - Replace function. To check upto what level replace is
>>>>>>>>    supported (considering function overloading) .
>>>>>>>>    - Signature should be associated with representation?
>>>>>>>>
>>>>>>>>    I think we are close on the spec. Please review the proposal
>>>>>>>>    
>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>
>>>>>>>>    .
>>>>>>>>
>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>
>>>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>>>>>> Google Meet joining info
>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>
>>>>>>>> - Ajantha
>>>>>>>>
>>>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Can it be handled by Iceberg encryption? If the whole metadata is
>>>>>>>>> encrypted, we don't have to worry about just hiding the UDF body? Let 
>>>>>>>>> us
>>>>>>>>> discuss more on the sync today.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, hiding the definition and disabling pushdown are required.We
>>>>>>>>>> will need a named key(e.g., secure) somewhere, no matter if it is a 
>>>>>>>>>> top
>>>>>>>>>> level property or a key as a part of the UDF properties. So that 
>>>>>>>>>> both UDF
>>>>>>>>>> creator and consumer can recognize it.
>>>>>>>>>>
>>>>>>>>>> Yufei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the extra detail. What do you think the spec would
>>>>>>>>>>> require? Would it require hiding the UDF definition from users and 
>>>>>>>>>>> require
>>>>>>>>>>> specific pushdown cases be disabled? The use cases seem valid, but 
>>>>>>>>>>> I'm
>>>>>>>>>>> trying to understand the requirements this places on engines and 
>>>>>>>>>>> why it
>>>>>>>>>>> needs to be part of the spec, rather than part of the properties of 
>>>>>>>>>>> the UDF.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>
>>>>>>>>>>>> Here are the main use cases for secure UDFs:
>>>>>>>>>>>>
>>>>>>>>>>>>    1.
>>>>>>>>>>>>
>>>>>>>>>>>>    Hiding UDF Definitions: This includes concealing the UDF
>>>>>>>>>>>>    body and details like the list of imports, some of them aren’t 
>>>>>>>>>>>> applicable
>>>>>>>>>>>>    to SQL UDFs.
>>>>>>>>>>>>    2.
>>>>>>>>>>>>
>>>>>>>>>>>>    Sandboxed Execution: Ensuring the UDF runs in an isolated
>>>>>>>>>>>>    environment. Again, this typically doesn’t apply to SQL UDFs.
>>>>>>>>>>>>    3.
>>>>>>>>>>>>
>>>>>>>>>>>>    Preventing Data Leakage at Execution Time: For example,
>>>>>>>>>>>>    secure UDFs may disable certain optimizations—such as predicate 
>>>>>>>>>>>> pushdown—to
>>>>>>>>>>>>    avoid exposing sensitive data indirectly. [1]
>>>>>>>>>>>>
>>>>>>>>>>>> Given these scenarios, I agree with your point that the secure
>>>>>>>>>>>> flag is primarily an instruction to the engine to behave 
>>>>>>>>>>>> differently. While
>>>>>>>>>>>> it's largely an engine-side behavior, we still need to include 
>>>>>>>>>>>> this flag in
>>>>>>>>>>>> the UDF definition to indicate whether a UDF is secure, especially
>>>>>>>>>>>> considering the perf penalty introduced by scenario #3. We should 
>>>>>>>>>>>> clearly
>>>>>>>>>>>> recommend that users avoid marking UDFs as secure unless it's truly
>>>>>>>>>>>> necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown
>>>>>>>>>>>> Yufei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei, could you make the argument for supporting a "secure"
>>>>>>>>>>>>> UDF? What use case are you addressing and what specifically 
>>>>>>>>>>>>> changes about
>>>>>>>>>>>>> how the UDF is handled? If the idea is to hide the UDF 
>>>>>>>>>>>>> definition, do we
>>>>>>>>>>>>> need to include it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this would be a signal to a "trusted engine". When the
>>>>>>>>>>>>> engine interacts with the catalog it sends authorization 
>>>>>>>>>>>>> information about
>>>>>>>>>>>>> itself in addition to the user that it is acting on behalf of. 
>>>>>>>>>>>>> That way the
>>>>>>>>>>>>> catalog knows that the secure UDF can be sent to the engine and 
>>>>>>>>>>>>> won't be
>>>>>>>>>>>>> shown to the user. The majority of this logic is on the REST 
>>>>>>>>>>>>> server side,
>>>>>>>>>>>>> and the only part that is communicated to the client is the 
>>>>>>>>>>>>> request not to
>>>>>>>>>>>>> show the UDF to the user, right? In that case should this be a 
>>>>>>>>>>>>> property
>>>>>>>>>>>>> rather than part of the definition? Even if we state that the 
>>>>>>>>>>>>> client "must"
>>>>>>>>>>>>> suppress the UDF definition, it's really just a request. Only 
>>>>>>>>>>>>> trusted
>>>>>>>>>>>>> engines can be passed the UDF definition, so a spec requirement 
>>>>>>>>>>>>> to suppress
>>>>>>>>>>>>> the definition isn't very meaningful.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether those
>>>>>>>>>>>>>> statements run within a single transaction should be treated as 
>>>>>>>>>>>>>> an
>>>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the 
>>>>>>>>>>>>>> expectation,
>>>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even if a 
>>>>>>>>>>>>>> UDF
>>>>>>>>>>>>>> declares itself transactional, the engine may or may not enforce 
>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF”
>>>>>>>>>>>>>> option supported by some engines[1], so the body and any 
>>>>>>>>>>>>>> sensitive details
>>>>>>>>>>>>>> stay hidden from callers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing
>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - We have gone through the SQL UDF syntax supported by
>>>>>>>>>>>>>>>    different engines (Snowflake, databricks, Dremio, Trino, OSS 
>>>>>>>>>>>>>>> spark 4.0).
>>>>>>>>>>>>>>>    - Each engine uses its own block separator, like $$ or
>>>>>>>>>>>>>>>    '' or none. Action item was to check whether engines support
>>>>>>>>>>>>>>>    multi-statement (transactional) UDF bodies.
>>>>>>>>>>>>>>>    - Discussed about function overloading. Need to check
>>>>>>>>>>>>>>>    whether these engines support function overloading for SQL 
>>>>>>>>>>>>>>> UDFs. Postgres
>>>>>>>>>>>>>>>    supports it! If yes, need to adopt the spec to handle it.
>>>>>>>>>>>>>>>    - Started online spec review and discussed the
>>>>>>>>>>>>>>>    deterministic flag and concluded that we keep the 
>>>>>>>>>>>>>>> independent fields (like
>>>>>>>>>>>>>>>    deterministic) in spec only if the majority of engines 
>>>>>>>>>>>>>>> supports it. Else it
>>>>>>>>>>>>>>>    will be passed in a property bag (engine specific). And it 
>>>>>>>>>>>>>>> is the engine's
>>>>>>>>>>>>>>>    responsibility to honor those optional properties.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Feel free to review the current proposal document here
>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is ready.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    We discussed including Python support; the majority
>>>>>>>>>>>>>>>>    agreed *not to* (see recording for details).
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    No strong opposition to versioning — it will be
>>>>>>>>>>>>>>>>    included to support change tracking and similar use cases.
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    Suggestions were made to document how each catalog
>>>>>>>>>>>>>>>>    resolves UDFs, similar to views and tables.
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    We agreed not to deviate from the existing table/view
>>>>>>>>>>>>>>>>    spec — e.g., location will remain *required* for
>>>>>>>>>>>>>>>>    cross-catalog compatibility.
>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    We also discussed a bit about view interoperability as
>>>>>>>>>>>>>>>>    the same things are applicable here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    Feel free to review the proposal document
>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
>>>>>>>>>>>>>>>>  here.
>>>>>>>>>>>>>>>>    With the current scope, it is similar to the view/table 
>>>>>>>>>>>>>>>> spec now.
>>>>>>>>>>>>>>>>    Final spec will be put to review and vote once it is
>>>>>>>>>>>>>>>>    ready.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the
>>>>>>>>>>>>>>>>> UDF project. Everyone’s welcome to drop in and share ideas! 
>>>>>>>>>>>>>>>>> Here is the
>>>>>>>>>>>>>>>>> meeting link:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Iceberg UDF sync
>>>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am
>>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Update on the progress.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss
>>>>>>>>>>>>>>>>>> the UDF proposal. We covered several key points, though some 
>>>>>>>>>>>>>>>>>> are still open
>>>>>>>>>>>>>>>>>> for further discussion:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for
>>>>>>>>>>>>>>>>>> UDFs at this stage? We explored the possibility of 
>>>>>>>>>>>>>>>>>> simplifying the
>>>>>>>>>>>>>>>>>> specification by avoiding view replication, and potentially 
>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>> versioning support later. UDTFs, being a superset of views 
>>>>>>>>>>>>>>>>>> in some ways,
>>>>>>>>>>>>>>>>>> may not require versioning initially.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not
>>>>>>>>>>>>>>>>>> support vararg syntax in CREATE FUNCTION, Iceberg UDFs
>>>>>>>>>>>>>>>>>> could represent such arguments as lists when supported by 
>>>>>>>>>>>>>>>>>> the engine.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t
>>>>>>>>>>>>>>>>>> support generic types (e.g., object), we can only map
>>>>>>>>>>>>>>>>>> engine-specific types to Iceberg types. As a result, generic 
>>>>>>>>>>>>>>>>>> data types
>>>>>>>>>>>>>>>>>> will not be supported in the initial version.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language
>>>>>>>>>>>>>>>>>> for SQL UDFs seems promising, especially given its potential 
>>>>>>>>>>>>>>>>>> to resolve
>>>>>>>>>>>>>>>>>> interoperability challenges. Some engines, however, require 
>>>>>>>>>>>>>>>>>> platform
>>>>>>>>>>>>>>>>>> version and package dependency details to execute Python 
>>>>>>>>>>>>>>>>>> code—this should
>>>>>>>>>>>>>>>>>> be captured in the specification.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Next Steps*
>>>>>>>>>>>>>>>>>> I will update the proposal document with two primary UDF
>>>>>>>>>>>>>>>>>> use cases:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    Policy exchange between engines
>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    UDTF as a superset of view functionality
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The update will include corresponding syntax examples in
>>>>>>>>>>>>>>>>>> both SQL and Python, and detail how each use case is 
>>>>>>>>>>>>>>>>>> represented in Iceberg
>>>>>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more
>>>>>>>>>>>>>>>>>> interested participants) to continue refining and finalizing 
>>>>>>>>>>>>>>>>>> the UDF
>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the
>>>>>>>>>>>>>>>>>>> previous comments. Additionally, I've included the SQL UDF 
>>>>>>>>>>>>>>>>>>> syntax supported
>>>>>>>>>>>>>>>>>>> by various vendors, including Dremio, Snowflake, 
>>>>>>>>>>>>>>>>>>> Databricks, and Trino.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper
>>>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, especially 
>>>>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>>>>> renewed interest from the community.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was
>>>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and adding 
>>>>>>>>>>>>>>>>>>>> endpoints for
>>>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better
>>>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for 
>>>>>>>>>>>>>>>>>>>> fine-grained access
>>>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond just 
>>>>>>>>>>>>>>>>>>>> versioned and
>>>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many 
>>>>>>>>>>>>>>>>>>>> vendors are interested
>>>>>>>>>>>>>>>>>>>> in this feature.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d
>>>>>>>>>>>>>>>>>>>> like to take ownership of this effort and revive the work. 
>>>>>>>>>>>>>>>>>>>> I'll be
>>>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and will 
>>>>>>>>>>>>>>>>>>>> share an updated
>>>>>>>>>>>>>>>>>>>> proposal by next week.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg!
>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be
>>>>>>>>>>>>>>>>>>>>> SQL. It merely does not specify (in this revision) how 
>>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>> representations are to be written.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type
>>>>>>>>>>>>>>>>>>>>> in the "Representations" section).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue
>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the
>>>>>>>>>>>>>>>>>>>>>> spec. It leaves a way for future versions to add 
>>>>>>>>>>>>>>>>>>>>>> different representations
>>>>>>>>>>>>>>>>>>>>>> later, but only SQL is supported. That was also the 
>>>>>>>>>>>>>>>>>>>>>> feedback to my initial
>>>>>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL
>>>>>>>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring 
>>>>>>>>>>>>>>>>>>>>>>> SQL in examples... It
>>>>>>>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this
>>>>>>>>>>>>>>>>>>>>>>>> proposal focuses on SQL-based engines, while 
>>>>>>>>>>>>>>>>>>>>>>>> Python-based systems often
>>>>>>>>>>>>>>>>>>>>>>>> work with data frames. Adding imperative languages 
>>>>>>>>>>>>>>>>>>>>>>>> like Python would make
>>>>>>>>>>>>>>>>>>>>>>>> this proposal more inclusive.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before  in this thread
>>>>>>>>>>>>>>>>>>>>>>>>> [1] i read
>>>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to
>>>>>>>>>>>>>>>>>>>>>>>>> share among different engines."
>>>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully
>>>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design addresses 
>>>>>>>>>>>>>>>>>>>>>>>>> shareability between the
>>>>>>>>>>>>>>>>>>>>>>>>> engines though.
>>>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>>>>>>> Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created
>>>>>>>>>>>>>>>>>>>>>>>>>> functions shareable
>>>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in
>>>>>>>>>>>>>>>>>>>>>>>>>> imperative code?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The
>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea!
>>>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created
>>>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines?
>>>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement
>>>>>>>>>>>>>>>>>>>>>>>>>> look like in e..g Spark or Trino?
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > Best
>>>>>>>>>>>>>>>>>>>>>>>>>> > Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added
>>>>>>>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a 
>>>>>>>>>>>>>>>>>>>>>>>>>> design doc that covers
>>>>>>>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the 
>>>>>>>>>>>>>>>>>>>>>>>>>> choice to enumerate all
>>>>>>>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing 
>>>>>>>>>>>>>>>>>>>>>>>>>> generics and varargs.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function
>>>>>>>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a 
>>>>>>>>>>>>>>>>>>>>>>>>>> discussion of the
>>>>>>>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think 
>>>>>>>>>>>>>>>>>>>>>>>>>> it would also be very
>>>>>>>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for 
>>>>>>>>>>>>>>>>>>>>>>>>>> this included in the
>>>>>>>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it 
>>>>>>>>>>>>>>>>>>>>>>>>>> solves those use cases
>>>>>>>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs.
>>>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is
>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For 
>>>>>>>>>>>>>>>>>>>>>>>>>> example, using string IDs
>>>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer.
>>>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec
>>>>>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. It 
>>>>>>>>>>>>>>>>>>>>>>>>>> requires readers to
>>>>>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which prevents 
>>>>>>>>>>>>>>>>>>>>>>>>>> the spec from adding
>>>>>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other 
>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg specs only require
>>>>>>>>>>>>>>>>>>>>>>>>>> a version change to introduce forward-incompatible 
>>>>>>>>>>>>>>>>>>>>>>>>>> changes and I think that
>>>>>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion.
>>>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple
>>>>>>>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear 
>>>>>>>>>>>>>>>>>>>>>>>>>> how to encode them
>>>>>>>>>>>>>>>>>>>>>>>>>> because a version is associated with a single 
>>>>>>>>>>>>>>>>>>>>>>>>>> function signature.
>>>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating
>>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that 
>>>>>>>>>>>>>>>>>>>>>>>>>> the metadata proposed
>>>>>>>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases.
>>>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows
>>>>>>>>>>>>>>>>>>>>>>>>>> a SELECT statement and it isn’t clear how this is 
>>>>>>>>>>>>>>>>>>>>>>>>>> distinct from a view
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on
>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review
>>>>>>>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next 
>>>>>>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a
>>>>>>>>>>>>>>>>>>>>>>>>>> look at the proposal
>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin
>>>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an
>>>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some 
>>>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine
>>>>>>>>>>>>>>>>>>>>>>>>>> tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be
>>>>>>>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was 
>>>>>>>>>>>>>>>>>>>>>>>>>> directly linked in the
>>>>>>>>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't
>>>>>>>>>>>>>>>>>>>>>>>>>> get any review on the proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far
>>>>>>>>>>>>>>>>>>>>>>>>>> (from Benny).
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the
>>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the
>>>>>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant
>>>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here 
>>>>>>>>>>>>>>>>>>>>>>>>>> are some examples of
>>>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2):
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a
>>>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data flow/data 
>>>>>>>>>>>>>>>>>>>>>>>>>> pipeline API instead of
>>>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also possible 
>>>>>>>>>>>>>>>>>>>>>>>>>> in the very long run :)
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative
>>>>>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some
>>>>>>>>>>>>>>>>>>>>>>>>>> long term opportunities in this case. Consider you 
>>>>>>>>>>>>>>>>>>>>>>>>>> register a Spark temp
>>>>>>>>>>>>>>>>>>>>>>>>>> view as some sort of data frame read, then it could 
>>>>>>>>>>>>>>>>>>>>>>>>>> still be resolved to a
>>>>>>>>>>>>>>>>>>>>>>>>>> Spark plan that is representable by an intermediate 
>>>>>>>>>>>>>>>>>>>>>>>>>> representation. But I
>>>>>>>>>>>>>>>>>>>>>>>>>> agree this gets very complicated very soon, and just 
>>>>>>>>>>>>>>>>>>>>>>>>>> having the case (1)
>>>>>>>>>>>>>>>>>>>>>>>>>> covered would already be a huge step forward.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny
>>>>>>>>>>>>>>>>>>>>>>>>>> Chow <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a
>>>>>>>>>>>>>>>>>>>>>>>>>> tabular SQL UDF can be used to build a parameterized 
>>>>>>>>>>>>>>>>>>>>>>>>>> view.  So, there's
>>>>>>>>>>>>>>>>>>>>>>>>>> definitely a lot in common between UDFs and views.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about
>>>>>>>>>>>>>>>>>>>>>>>>>> what is perceived as a "UDF". There are 2 flavors:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the
>>>>>>>>>>>>>>>>>>>>>>>>>> user whose definition is a composition of other 
>>>>>>>>>>>>>>>>>>>>>>>>>> built-in functions/SQL
>>>>>>>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative
>>>>>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's
>>>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I think 
>>>>>>>>>>>>>>>>>>>>>>>>>> those have more analogy to
>>>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is not 
>>>>>>>>>>>>>>>>>>>>>>>>>> practical to maintain by
>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are around 
>>>>>>>>>>>>>>>>>>>>>>>>>> (1), and may be worth
>>>>>>>>>>>>>>>>>>>>>>>>>> evaluating.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM
>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you
>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very 
>>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models 
>>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports
>>>>>>>>>>>>>>>>>>>>>>>>>> SQL representations of UDFs (similar to views as 
>>>>>>>>>>>>>>>>>>>>>>>>>> shared by the reference
>>>>>>>>>>>>>>>>>>>>>>>>>> links above), the complexity involved will be 
>>>>>>>>>>>>>>>>>>>>>>>>>> similar to managing views.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for
>>>>>>>>>>>>>>>>>>>>>>>>>> your input.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft
>>>>>>>>>>>>>>>>>>>>>>>>>> spec (inspired by the view spec) this week to 
>>>>>>>>>>>>>>>>>>>>>>>>>> facilitate further
>>>>>>>>>>>>>>>>>>>>>>>>>> discussions.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack
>>>>>>>>>>>>>>>>>>>>>>>>>> Ye <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a
>>>>>>>>>>>>>>>>>>>>>>>>>> common set of functions across engines, I don't see 
>>>>>>>>>>>>>>>>>>>>>>>>>> how that is practical
>>>>>>>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. 
>>>>>>>>>>>>>>>>>>>>>>>>>> Plugging in code -- and
>>>>>>>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems 
>>>>>>>>>>>>>>>>>>>>>>>>>> inherently specialized to me
>>>>>>>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the
>>>>>>>>>>>>>>>>>>>>>>>>>> views? I feel we can say exactly the same thing for 
>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg views, but yet
>>>>>>>>>>>>>>>>>>>>>>>>>> we have Iceberg multi-dialect views implemented. 
>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it sounds like we
>>>>>>>>>>>>>>>>>>>>>>>>>> are trying to draw a line between SQL vs other 
>>>>>>>>>>>>>>>>>>>>>>>>>> programming language as
>>>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type of 
>>>>>>>>>>>>>>>>>>>>>>>>>> code, and we are already
>>>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different code 
>>>>>>>>>>>>>>>>>>>>>>>>>> dialects to an
>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects like 
>>>>>>>>>>>>>>>>>>>>>>>>>> Coral, Substrait), which
>>>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of representation of 
>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg view. I think
>>>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if 
>>>>>>>>>>>>>>>>>>>>>>>>>> developed.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support
>>>>>>>>>>>>>>>>>>>>>>>>>> is a good idea, even just a multi-dialect one like 
>>>>>>>>>>>>>>>>>>>>>>>>>> view, and that can allow
>>>>>>>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a 
>>>>>>>>>>>>>>>>>>>>>>>>>> function referenced
>>>>>>>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect 
>>>>>>>>>>>>>>>>>>>>>>>>>> UDF definition.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we
>>>>>>>>>>>>>>>>>>>>>>>>>> have the actual proposal published.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM
>>>>>>>>>>>>>>>>>>>>>>>>>> Robert Stupp <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and
>>>>>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The 
>>>>>>>>>>>>>>>>>>>>>>>>>> same performance concerns
>>>>>>>>>>>>>>>>>>>>>>>>>> apply to views as well.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common
>>>>>>>>>>>>>>>>>>>>>>>>>> base upon which engines can build, so the argument 
>>>>>>>>>>>>>>>>>>>>>>>>>> that UDFs aren't
>>>>>>>>>>>>>>>>>>>>>>>>>> practical, because engines are different, is 
>>>>>>>>>>>>>>>>>>>>>>>>>> probably only a temporary
>>>>>>>>>>>>>>>>>>>>>>>>>> concern.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should
>>>>>>>>>>>>>>>>>>>>>>>>>> also try to tackle the idea to make views portable, 
>>>>>>>>>>>>>>>>>>>>>>>>>> which is conceptually
>>>>>>>>>>>>>>>>>>>>>>>>>> not that much different from portable UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a
>>>>>>>>>>>>>>>>>>>>>>>>>> negative touch to the idea of having UDFs in 
>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, especially not in
>>>>>>>>>>>>>>>>>>>>>>>>>> this early stage.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a
>>>>>>>>>>>>>>>>>>>>>>>>>> good idea to add UDFs tracked by Iceberg catalogs. I 
>>>>>>>>>>>>>>>>>>>>>>>>>> think that Iceberg
>>>>>>>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, 
>>>>>>>>>>>>>>>>>>>>>>>>>> like tables of data.
>>>>>>>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of 
>>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, I
>>>>>>>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines 
>>>>>>>>>>>>>>>>>>>>>>>>>> are implemented so
>>>>>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially 
>>>>>>>>>>>>>>>>>>>>>>>>>> custom user-supplied code
>>>>>>>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be 
>>>>>>>>>>>>>>>>>>>>>>>>>> part of the engines'
>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you
>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very 
>>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models 
>>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM
>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the
>>>>>>>>>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs 
>>>>>>>>>>>>>>>>>>>>>>>>>> in Iceberg.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec
>>>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in Iceberg 
>>>>>>>>>>>>>>>>>>>>>>>>>> (inspired by view spec).
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly
>>>>>>>>>>>>>>>>>>>>>>>>>> to views in that they are associated with tables, 
>>>>>>>>>>>>>>>>>>>>>>>>>> but they can accept
>>>>>>>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even 
>>>>>>>>>>>>>>>>>>>>>>>>>> function as inline expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio,
>>>>>>>>>>>>>>>>>>>>>>>>>> Trino, Snowflake, Databricks Spark supports SQL UDFs 
>>>>>>>>>>>>>>>>>>>>>>>>>> at catalog level [1].
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can
>>>>>>>>>>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the
>>>>>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the UDFs 
>>>>>>>>>>>>>>>>>>>>>>>>>> written by other
>>>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer).
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this
>>>>>>>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, 
>>>>>>>>>>>>>>>>>>>>>>>>>> and we're eager to
>>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF 
>>>>>>>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun
>>>>>>>>>>>>>>>>>>>>>>>>>> drafting a specification to propose to the community.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this.
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>> Databricks
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

Reply via email to