Hi folks, thanks for joining today’s UDF sync.

We covered the UDF metadata structure, captured in this doc:
https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
.

We also discussed a way to avoid copying every overload into the new
metadata JSON when creating a new version. One of ideas is to introduce a
global version array, this is not yet reflected in the doc, but I’ll update
it shortly. Other key points:

   - The latest UDF version will typically be used in most scenarios, but
   engines retain the flexibility to choose which version to execute.
   - Keeping the version while referring to an UDF probably isn't a good
   idea. Users are responsible for updating downstream views if they reference
   older UDF versions.

You can watch the recording here:
https://www.youtube.com/watch?v=6ResT-ODelI&ab_channel=ApacheIceberg

Yufei


On Mon, Aug 25, 2025 at 6:36 PM Yufei Gu <flyrain...@gmail.com> wrote:

> Hi folks, thanks for attending today’s UDF sync. In general, we discussed
> the UDF metadata structure, captured at this doc(
> https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing
> ). Here is the detailed summary:
>
>    1. Each UDF overload has its own return type. e.g., `add(int, int)`
>    returns `int`, while `add(long, long)`  returns `long`
>    2. Return type should be explicitly specified, no implicit or
>    statement-based return type inference should be allowed.
>    3. Adding explicit properties like deterministic, doc properties at
>    the overload level.
>    4. Adding property “secure” at the top level.
>    5. Introducing a dedicated signature definitions section to centralize
>    metadata (Function parameters, Return type, Parameter descriptions). Each
>    overload would reference a signature definition by ID. This decoupling
>    allows signature-related updates (like modifying parameter descriptions)
>    without requiring a new UDF version, similar to how updating a table schema
>    doesn’t create a new snapshot.
>    6. Whether to have versioned open properties or not. Versioned
>    properties can lead to unnecessary copying of a bag of properties into each
>    version, while it provides a clear history of properties for any future
>    debugging and understanding of the UDF behavior at a specific point in
>    time.
>
> Watch the recording here,
> https://www.youtube.com/watch?v=p7CvuGZKLSo&list=PLkifVhhWtccwzc3oRWjy5XiYJl0R6kdQL
>
> Yufei
>
>
> On Thu, Aug 21, 2025 at 4:18 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> Hi everyone, here’s the summary from our last sync on 8/11. Apologies for
>> the delay!
>>
>>    - One UDF entity for all overloads
>>       - We agreed to combine overloads with the same name into a single
>>       UDF entity, which shares a common metadata.json file.
>>       - Listing UDFs will return a list of UDF names, not a list of
>>       individual signatures.
>>       - Loading a UDF by name will return all of its overloads.
>>    - Versioning Strategy
>>       - A global version number will track changes across the entire UDF
>>       entity, it increments monolithically.
>>       - Each overload will also maintain its own version (e.g.,
>>       updated_at_version) to trace changes specific to that overload.
>>    - For simplicity, the load API will not support argument-based
>>    filtering in the initial release. It will always return all overloads for 
>> a
>>    given UDF name, overload-level loading is not supported at this stage.
>>
>> Watch the recording here,
>> https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view
>>
>> Yufei
>>
>>
>> On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>
>>> To recap and add my thoughts, we want to support UDFs with multiple
>>> signatures under the same name, which can serve both overload-aware and
>>> overload-naive engines.
>>>
>>> Per my investigation[1], most engines support overloading by arguments
>>> and allow implicit conversions like numeric widening (e.g., INT →
>>> BIGINT/FLOAT). The resolution approach causes issues like silent behavior
>>> change. Here is an example:
>>>
>>>    - Initially, only foo(DOUBLE) exists.
>>>    - foo(42::INT) widens INT → DOUBLE and runs expected code.
>>>    - Later: malicious user creates foo(BIGINT).
>>>    - Engine’s best-match resolution now binds the same call to the new
>>>    overload, changing behavior without modifying the query.
>>>
>>> To mitigate this issue, we have to choose between these two access
>>> control models:
>>>
>>>    1. Model A – Name-Level ACL: Grants apply to all overloads of a
>>>    function name.
>>>    2. Model B – Signature-Level ACL: Grants tied to specific signatures.
>>>
>>> The general recommendation is to adopt *Model A.* It trades some
>>> precision for safety and simplicity, while eliminating the silent behavior
>>> change problem. More details are in this doc[1].
>>>
>>> 1.
>>> https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0
>>>
>>> Yufei
>>>
>>>
>>> On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <ajanthab...@gmail.com>
>>> wrote:
>>>
>>>> Thanks to everyone who joined the sync.
>>>> Here is the meeting recording:
>>>> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view
>>>>
>>>> Summary:
>>>> We have discussed how to define function identifiers (should also
>>>> handle function overloading). Ryan suggested that we should check how Spark
>>>> does it. We can refer to functions using an identifier and then bind the
>>>> different signatures to it. So that access policies can be applied per
>>>> identifier. This is also linked to how we want to version the functions
>>>> when overloading is supported.
>>>>
>>>> I will check more about this and update the proposal doc.
>>>>
>>>> Please check/subscribe to the dev events calendar for the next
>>>> meeting link (Aug 11).
>>>>
>>>> - Ajantha
>>>>
>>>> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <kevinjq...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Ajantha,
>>>>>
>>>>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events"
>>>>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but i'll
>>>>> be at this one.
>>>>>
>>>>> Best,
>>>>> Kevin Liu
>>>>>
>>>>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <ajanthab...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> No one joined the sync today. I came to know that Yufei is on
>>>>>> holiday, and Ryan and others couldn't make it, similar to the last sync. 
>>>>>> It
>>>>>> seems Yufei might have forgotten to transfer meeting ownership as well, 
>>>>>> as
>>>>>> new members needed admin approval and couldn't join automatically this
>>>>>> week. Also, I can understand it is summer holiday season for many.
>>>>>>
>>>>>> I've updated the function signature schema and other open points. I
>>>>>> believe we're very close to the final version of the spec. A meeting is
>>>>>> indeed necessary to finalize this, but we don't have to wait for it to
>>>>>> finish the review process. We had many meetings on this in the past
>>>>>> already. So, please review the document at your earliest convenience. If 
>>>>>> we
>>>>>> agree on the spec by next week, I can raise a PR.
>>>>>>
>>>>>> - Ajantha
>>>>>>
>>>>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <flyrain...@gmail.com> wrote:
>>>>>>
>>>>>>> I’d propose to move the field `properties` from a top level field to
>>>>>>> a field inside “version” along with a representation, so that properties
>>>>>>> are versioned. A property like “deterministic” could change along with
>>>>>>> representation over time. For example, we need to change “deterministic”
>>>>>>> from true to false in case of adding a non-deterministic SQL
>>>>>>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback 
>>>>>>> won't
>>>>>>> be safe.
>>>>>>>
>>>>>>> That said, it's still an open question whether we need any
>>>>>>> non-versioned properties. We can introduce them later if a use case 
>>>>>>> arises.
>>>>>>>
>>>>>>> Yufei
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>
>>>>>>>> I’d prefer to keep the signature list separate from the
>>>>>>>> representation history. Here are reasons:
>>>>>>>>
>>>>>>>>    1. Each version still enforces a single signature. Although the
>>>>>>>>    signatures array is global to the UDF, each version references just 
>>>>>>>> one
>>>>>>>>    signature ID. Rollbacks to historical versions remain safe.
>>>>>>>>    2. We’ve separated the less frequently changing component
>>>>>>>>    (signatures) from the more dynamic one (representations) to reduce 
>>>>>>>> metadata
>>>>>>>>    file size.
>>>>>>>>    3. Since signatures use Iceberg data types, they should remain
>>>>>>>>    unaffected by multi-dialect representation differences.
>>>>>>>>
>>>>>>>> Yufei
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <
>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>> Here is the meeting recording:
>>>>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing
>>>>>>>>>
>>>>>>>>> Summary:
>>>>>>>>> We have discussed the action items from the last sync (*see
>>>>>>>>> Appendix C* in the proposal doc)
>>>>>>>>>
>>>>>>>>>    - Function overloading: Supported by few of the engines and in
>>>>>>>>>    the roadmaps of many engines. Iceberg will support it. We will 
>>>>>>>>> maintain the
>>>>>>>>>    `FunctionIdentifier` (extends `TableIdentifer` but also have a 
>>>>>>>>> member
>>>>>>>>>    containing the function argument's type list). And all operations 
>>>>>>>>> like
>>>>>>>>>    load, rename, list, create and drop are based on 
>>>>>>>>> `FunctionIdentifier`.
>>>>>>>>>    - Secure UDF: If we store it as a property in a bag, we need
>>>>>>>>>    to standardize the property name. Iceberg encryption may be 
>>>>>>>>> orthogonal to
>>>>>>>>>    this discussion.
>>>>>>>>>    - UDF with multi statement and procedural bodies are supported
>>>>>>>>>    by some engines. Iceberg will support it. Store the body as it is 
>>>>>>>>> while
>>>>>>>>>    creating function by the engine.
>>>>>>>>>
>>>>>>>>> new discussions around
>>>>>>>>>
>>>>>>>>>    - Standardizing the property names (deterministic, secure).
>>>>>>>>>    - About the rename function.
>>>>>>>>>    - Replace function. To check upto what level replace is
>>>>>>>>>    supported (considering function overloading) .
>>>>>>>>>    - Signature should be associated with representation?
>>>>>>>>>
>>>>>>>>>    I think we are close on the spec. Please review the proposal
>>>>>>>>>    
>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>
>>>>>>>>>    .
>>>>>>>>>
>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>
>>>>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles
>>>>>>>>> Google Meet joining info
>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>
>>>>>>>>> - Ajantha
>>>>>>>>>
>>>>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Can it be handled by Iceberg encryption? If the whole metadata is
>>>>>>>>>> encrypted, we don't have to worry about just hiding the UDF body? 
>>>>>>>>>> Let us
>>>>>>>>>> discuss more on the sync today.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes, hiding the definition and disabling pushdown are
>>>>>>>>>>> required.We will need a named key(e.g., secure) somewhere, no 
>>>>>>>>>>> matter if it
>>>>>>>>>>> is a top level property or a key as a part of the UDF properties. 
>>>>>>>>>>> So that
>>>>>>>>>>> both UDF creator and consumer can recognize it.
>>>>>>>>>>>
>>>>>>>>>>> Yufei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the extra detail. What do you think the spec would
>>>>>>>>>>>> require? Would it require hiding the UDF definition from users and 
>>>>>>>>>>>> require
>>>>>>>>>>>> specific pushdown cases be disabled? The use cases seem valid, but 
>>>>>>>>>>>> I'm
>>>>>>>>>>>> trying to understand the requirements this places on engines and 
>>>>>>>>>>>> why it
>>>>>>>>>>>> needs to be part of the spec, rather than part of the properties 
>>>>>>>>>>>> of the UDF.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here are the main use cases for secure UDFs:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Hiding UDF Definitions: This includes concealing the UDF
>>>>>>>>>>>>>    body and details like the list of imports, some of them aren’t 
>>>>>>>>>>>>> applicable
>>>>>>>>>>>>>    to SQL UDFs.
>>>>>>>>>>>>>    2.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Sandboxed Execution: Ensuring the UDF runs in an isolated
>>>>>>>>>>>>>    environment. Again, this typically doesn’t apply to SQL UDFs.
>>>>>>>>>>>>>    3.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Preventing Data Leakage at Execution Time: For example,
>>>>>>>>>>>>>    secure UDFs may disable certain optimizations—such as 
>>>>>>>>>>>>> predicate pushdown—to
>>>>>>>>>>>>>    avoid exposing sensitive data indirectly. [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Given these scenarios, I agree with your point that the secure
>>>>>>>>>>>>> flag is primarily an instruction to the engine to behave 
>>>>>>>>>>>>> differently. While
>>>>>>>>>>>>> it's largely an engine-side behavior, we still need to include 
>>>>>>>>>>>>> this flag in
>>>>>>>>>>>>> the UDF definition to indicate whether a UDF is secure, especially
>>>>>>>>>>>>> considering the perf penalty introduced by scenario #3. We should 
>>>>>>>>>>>>> clearly
>>>>>>>>>>>>> recommend that users avoid marking UDFs as secure unless it's 
>>>>>>>>>>>>> truly
>>>>>>>>>>>>> necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yufei, could you make the argument for supporting a "secure"
>>>>>>>>>>>>>> UDF? What use case are you addressing and what specifically 
>>>>>>>>>>>>>> changes about
>>>>>>>>>>>>>> how the UDF is handled? If the idea is to hide the UDF 
>>>>>>>>>>>>>> definition, do we
>>>>>>>>>>>>>> need to include it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this would be a signal to a "trusted engine". When
>>>>>>>>>>>>>> the engine interacts with the catalog it sends authorization 
>>>>>>>>>>>>>> information
>>>>>>>>>>>>>> about itself in addition to the user that it is acting on behalf 
>>>>>>>>>>>>>> of. That
>>>>>>>>>>>>>> way the catalog knows that the secure UDF can be sent to the 
>>>>>>>>>>>>>> engine and
>>>>>>>>>>>>>> won't be shown to the user. The majority of this logic is on the 
>>>>>>>>>>>>>> REST
>>>>>>>>>>>>>> server side, and the only part that is communicated to the 
>>>>>>>>>>>>>> client is the
>>>>>>>>>>>>>> request not to show the UDF to the user, right? In that case 
>>>>>>>>>>>>>> should this be
>>>>>>>>>>>>>> a property rather than part of the definition? Even if we state 
>>>>>>>>>>>>>> that the
>>>>>>>>>>>>>> client "must" suppress the UDF definition, it's really just a 
>>>>>>>>>>>>>> request. Only
>>>>>>>>>>>>>> trusted engines can be passed the UDF definition, so a spec 
>>>>>>>>>>>>>> requirement to
>>>>>>>>>>>>>> suppress the definition isn't very meaningful.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <
>>>>>>>>>>>>>> flyrain...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the summary, Ajantha!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether
>>>>>>>>>>>>>>> those statements run within a single transaction should be 
>>>>>>>>>>>>>>> treated as an
>>>>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the 
>>>>>>>>>>>>>>> expectation,
>>>>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even if 
>>>>>>>>>>>>>>> a UDF
>>>>>>>>>>>>>>> declares itself transactional, the engine may or may not 
>>>>>>>>>>>>>>> enforce it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF”
>>>>>>>>>>>>>>> option supported by some engines[1], so the body and any 
>>>>>>>>>>>>>>> sensitive details
>>>>>>>>>>>>>>> stay hidden from callers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat <
>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing
>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - We have gone through the SQL UDF syntax supported by
>>>>>>>>>>>>>>>>    different engines (Snowflake, databricks, Dremio, Trino, 
>>>>>>>>>>>>>>>> OSS spark 4.0).
>>>>>>>>>>>>>>>>    - Each engine uses its own block separator, like $$ or
>>>>>>>>>>>>>>>>    '' or none. Action item was to check whether engines support
>>>>>>>>>>>>>>>>    multi-statement (transactional) UDF bodies.
>>>>>>>>>>>>>>>>    - Discussed about function overloading. Need to check
>>>>>>>>>>>>>>>>    whether these engines support function overloading for SQL 
>>>>>>>>>>>>>>>> UDFs. Postgres
>>>>>>>>>>>>>>>>    supports it! If yes, need to adopt the spec to handle it.
>>>>>>>>>>>>>>>>    - Started online spec review and discussed the
>>>>>>>>>>>>>>>>    deterministic flag and concluded that we keep the 
>>>>>>>>>>>>>>>> independent fields (like
>>>>>>>>>>>>>>>>    deterministic) in spec only if the majority of engines 
>>>>>>>>>>>>>>>> supports it. Else it
>>>>>>>>>>>>>>>>    will be passed in a property bag (engine specific). And it 
>>>>>>>>>>>>>>>> is the engine's
>>>>>>>>>>>>>>>>    responsibility to honor those optional properties.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Feel free to review the current proposal document here
>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is ready.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat <
>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync.
>>>>>>>>>>>>>>>>> Here is the meeting recording:
>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    We discussed including Python support; the majority
>>>>>>>>>>>>>>>>>    agreed *not to* (see recording for details).
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    No strong opposition to versioning — it will be
>>>>>>>>>>>>>>>>>    included to support change tracking and similar use cases.
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    Suggestions were made to document how each catalog
>>>>>>>>>>>>>>>>>    resolves UDFs, similar to views and tables.
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    We agreed not to deviate from the existing table/view
>>>>>>>>>>>>>>>>>    spec — e.g., location will remain *required* for
>>>>>>>>>>>>>>>>>    cross-catalog compatibility.
>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    We also discussed a bit about view interoperability as
>>>>>>>>>>>>>>>>>    the same things are applicable here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    Feel free to review the proposal document
>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0>
>>>>>>>>>>>>>>>>>  here.
>>>>>>>>>>>>>>>>>    With the current scope, it is similar to the view/table 
>>>>>>>>>>>>>>>>> spec now.
>>>>>>>>>>>>>>>>>    Final spec will be put to review and vote once it is
>>>>>>>>>>>>>>>>>    ready.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone:
>>>>>>>>>>>>>>>>> America/Los_Angeles
>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu <
>>>>>>>>>>>>>>>>> flyrain...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the
>>>>>>>>>>>>>>>>>> UDF project. Everyone’s welcome to drop in and share ideas! 
>>>>>>>>>>>>>>>>>> Here is the
>>>>>>>>>>>>>>>>>> meeting link:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Iceberg UDF sync
>>>>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am
>>>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Update on the progress.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss
>>>>>>>>>>>>>>>>>>> the UDF proposal. We covered several key points, though 
>>>>>>>>>>>>>>>>>>> some are still open
>>>>>>>>>>>>>>>>>>> for further discussion:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for
>>>>>>>>>>>>>>>>>>> UDFs at this stage? We explored the possibility of 
>>>>>>>>>>>>>>>>>>> simplifying the
>>>>>>>>>>>>>>>>>>> specification by avoiding view replication, and potentially 
>>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>> versioning support later. UDTFs, being a superset of views 
>>>>>>>>>>>>>>>>>>> in some ways,
>>>>>>>>>>>>>>>>>>> may not require versioning initially.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not
>>>>>>>>>>>>>>>>>>> support vararg syntax in CREATE FUNCTION, Iceberg UDFs
>>>>>>>>>>>>>>>>>>> could represent such arguments as lists when supported by 
>>>>>>>>>>>>>>>>>>> the engine.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t
>>>>>>>>>>>>>>>>>>> support generic types (e.g., object), we can only map
>>>>>>>>>>>>>>>>>>> engine-specific types to Iceberg types. As a result, 
>>>>>>>>>>>>>>>>>>> generic data types
>>>>>>>>>>>>>>>>>>> will not be supported in the initial version.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language
>>>>>>>>>>>>>>>>>>> for SQL UDFs seems promising, especially given its 
>>>>>>>>>>>>>>>>>>> potential to resolve
>>>>>>>>>>>>>>>>>>> interoperability challenges. Some engines, however, require 
>>>>>>>>>>>>>>>>>>> platform
>>>>>>>>>>>>>>>>>>> version and package dependency details to execute Python 
>>>>>>>>>>>>>>>>>>> code—this should
>>>>>>>>>>>>>>>>>>> be captured in the specification.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *Next Steps*
>>>>>>>>>>>>>>>>>>> I will update the proposal document with two primary UDF
>>>>>>>>>>>>>>>>>>> use cases:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Policy exchange between engines
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    UDTF as a superset of view functionality
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The update will include corresponding syntax examples in
>>>>>>>>>>>>>>>>>>> both SQL and Python, and detail how each use case is 
>>>>>>>>>>>>>>>>>>> represented in Iceberg
>>>>>>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more
>>>>>>>>>>>>>>>>>>> interested participants) to continue refining and 
>>>>>>>>>>>>>>>>>>> finalizing the UDF
>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the
>>>>>>>>>>>>>>>>>>>> previous comments. Additionally, I've included the SQL UDF 
>>>>>>>>>>>>>>>>>>>> syntax supported
>>>>>>>>>>>>>>>>>>>> by various vendors, including Dremio, Snowflake, 
>>>>>>>>>>>>>>>>>>>> Databricks, and Trino.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper
>>>>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, 
>>>>>>>>>>>>>>>>>>>> especially with the
>>>>>>>>>>>>>>>>>>>> renewed interest from the community.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was
>>>>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and 
>>>>>>>>>>>>>>>>>>>>> adding endpoints for
>>>>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better
>>>>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for 
>>>>>>>>>>>>>>>>>>>>> fine-grained access
>>>>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond just 
>>>>>>>>>>>>>>>>>>>>> versioned and
>>>>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many 
>>>>>>>>>>>>>>>>>>>>> vendors are interested
>>>>>>>>>>>>>>>>>>>>> in this feature.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d
>>>>>>>>>>>>>>>>>>>>> like to take ownership of this effort and revive the 
>>>>>>>>>>>>>>>>>>>>> work. I'll be
>>>>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and will 
>>>>>>>>>>>>>>>>>>>>> share an updated
>>>>>>>>>>>>>>>>>>>>> proposal by next week.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg!
>>>>>>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be
>>>>>>>>>>>>>>>>>>>>>> SQL. It merely does not specify (in this revision) how 
>>>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>> representations are to be written.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type
>>>>>>>>>>>>>>>>>>>>>> in the "Representations" section).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue
>>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the
>>>>>>>>>>>>>>>>>>>>>>> spec. It leaves a way for future versions to add 
>>>>>>>>>>>>>>>>>>>>>>> different representations
>>>>>>>>>>>>>>>>>>>>>>> later, but only SQL is supported. That was also the 
>>>>>>>>>>>>>>>>>>>>>>> feedback to my initial
>>>>>>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov
>>>>>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL
>>>>>>>>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring 
>>>>>>>>>>>>>>>>>>>>>>>> SQL in examples... It
>>>>>>>>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong <
>>>>>>>>>>>>>>>>>>>>>>>> fo...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this
>>>>>>>>>>>>>>>>>>>>>>>>> proposal focuses on SQL-based engines, while 
>>>>>>>>>>>>>>>>>>>>>>>>> Python-based systems often
>>>>>>>>>>>>>>>>>>>>>>>>> work with data frames. Adding imperative languages 
>>>>>>>>>>>>>>>>>>>>>>>>> like Python would make
>>>>>>>>>>>>>>>>>>>>>>>>> this proposal more inclusive.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen <
>>>>>>>>>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking!
>>>>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before  in this thread
>>>>>>>>>>>>>>>>>>>>>>>>>> [1] i read
>>>>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to
>>>>>>>>>>>>>>>>>>>>>>>>>> share among different engines."
>>>>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section).
>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully
>>>>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design addresses 
>>>>>>>>>>>>>>>>>>>>>>>>>> shareability between the
>>>>>>>>>>>>>>>>>>>>>>>>>> engines though.
>>>>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec
>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa
>>>>>>>>>>>>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created
>>>>>>>>>>>>>>>>>>>>>>>>>>> functions shareable
>>>>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in
>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative code?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen
>>>>>>>>>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>> The Iceberg UDFs are an interesting idea!
>>>>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created
>>>>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines?
>>>>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement
>>>>>>>>>>>>>>>>>>>>>>>>>>> look like in e..g Spark or Trino?
>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>> > Best
>>>>>>>>>>>>>>>>>>>>>>>>>>> > Piotr
>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added
>>>>>>>>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have 
>>>>>>>>>>>>>>>>>>>>>>>>>>> a design doc that covers
>>>>>>>>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the 
>>>>>>>>>>>>>>>>>>>>>>>>>>> choice to enumerate all
>>>>>>>>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing 
>>>>>>>>>>>>>>>>>>>>>>>>>>> generics and varargs.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function
>>>>>>>>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion of the
>>>>>>>>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think 
>>>>>>>>>>>>>>>>>>>>>>>>>>> it would also be very
>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for 
>>>>>>>>>>>>>>>>>>>>>>>>>>> this included in the
>>>>>>>>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it 
>>>>>>>>>>>>>>>>>>>>>>>>>>> solves those use cases
>>>>>>>>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is
>>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For 
>>>>>>>>>>>>>>>>>>>>>>>>>>> example, using string IDs
>>>>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec
>>>>>>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. 
>>>>>>>>>>>>>>>>>>>>>>>>>>> It requires readers to
>>>>>>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which 
>>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the spec from adding
>>>>>>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg specs only require
>>>>>>>>>>>>>>>>>>>>>>>>>>> a version change to introduce forward-incompatible 
>>>>>>>>>>>>>>>>>>>>>>>>>>> changes and I think that
>>>>>>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear 
>>>>>>>>>>>>>>>>>>>>>>>>>>> how to encode them
>>>>>>>>>>>>>>>>>>>>>>>>>>> because a version is associated with a single 
>>>>>>>>>>>>>>>>>>>>>>>>>>> function signature.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating
>>>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that 
>>>>>>>>>>>>>>>>>>>>>>>>>>> the metadata proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows
>>>>>>>>>>>>>>>>>>>>>>>>>>> a SELECT statement and it isn’t clear how this is 
>>>>>>>>>>>>>>>>>>>>>>>>>>> distinct from a view
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat <
>>>>>>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on
>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review
>>>>>>>>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next 
>>>>>>>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a
>>>>>>>>>>>>>>>>>>>>>>>>>>> look at the proposal
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin
>>>>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an
>>>>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some 
>>>>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine
>>>>>>>>>>>>>>>>>>>>>>>>>>> tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be
>>>>>>>>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was 
>>>>>>>>>>>>>>>>>>>>>>>>>>> directly linked in the
>>>>>>>>>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>> get any review on the proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far
>>>>>>>>>>>>>>>>>>>>>>>>>>> (from Benny).
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the
>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the
>>>>>>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant
>>>>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here 
>>>>>>>>>>>>>>>>>>>>>>>>>>> are some examples of
>>>>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2):
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF:
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions:
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a
>>>>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data flow/data 
>>>>>>>>>>>>>>>>>>>>>>>>>>> pipeline API instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also possible 
>>>>>>>>>>>>>>>>>>>>>>>>>>> in the very long run :)
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye
>>>>>>>>>>>>>>>>>>>>>>>>>>> <yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in
>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some
>>>>>>>>>>>>>>>>>>>>>>>>>>> long term opportunities in this case. Consider you 
>>>>>>>>>>>>>>>>>>>>>>>>>>> register a Spark temp
>>>>>>>>>>>>>>>>>>>>>>>>>>> view as some sort of data frame read, then it could 
>>>>>>>>>>>>>>>>>>>>>>>>>>> still be resolved to a
>>>>>>>>>>>>>>>>>>>>>>>>>>> Spark plan that is representable by an intermediate 
>>>>>>>>>>>>>>>>>>>>>>>>>>> representation. But I
>>>>>>>>>>>>>>>>>>>>>>>>>>> agree this gets very complicated very soon, and 
>>>>>>>>>>>>>>>>>>>>>>>>>>> just having the case (1)
>>>>>>>>>>>>>>>>>>>>>>>>>>> covered would already be a huge step forward.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny
>>>>>>>>>>>>>>>>>>>>>>>>>>> Chow <btc...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a
>>>>>>>>>>>>>>>>>>>>>>>>>>> tabular SQL UDF can be used to build a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> parameterized view.  So, there's
>>>>>>>>>>>>>>>>>>>>>>>>>>> definitely a lot in common between UDFs and views.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa
>>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about
>>>>>>>>>>>>>>>>>>>>>>>>>>> what is perceived as a "UDF". There are 2 flavors:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by
>>>>>>>>>>>>>>>>>>>>>>>>>>> the user whose definition is a composition of other 
>>>>>>>>>>>>>>>>>>>>>>>>>>> built-in functions/SQL
>>>>>>>>>>>>>>>>>>>>>>>>>>> expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in
>>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's
>>>>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I think 
>>>>>>>>>>>>>>>>>>>>>>>>>>> those have more analogy to
>>>>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is not 
>>>>>>>>>>>>>>>>>>>>>>>>>>> practical to maintain by
>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are around 
>>>>>>>>>>>>>>>>>>>>>>>>>>> (1), and may be worth
>>>>>>>>>>>>>>>>>>>>>>>>>>> evaluating.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you
>>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very 
>>>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models 
>>>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports
>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL representations of UDFs (similar to views as 
>>>>>>>>>>>>>>>>>>>>>>>>>>> shared by the reference
>>>>>>>>>>>>>>>>>>>>>>>>>>> links above), the complexity involved will be 
>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to managing views.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for
>>>>>>>>>>>>>>>>>>>>>>>>>>> your input.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the
>>>>>>>>>>>>>>>>>>>>>>>>>>> draft spec (inspired by the view spec) this week to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> facilitate further
>>>>>>>>>>>>>>>>>>>>>>>>>>> discussions.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have
>>>>>>>>>>>>>>>>>>>>>>>>>>> a common set of functions across engines, I don't 
>>>>>>>>>>>>>>>>>>>>>>>>>>> see how that is practical
>>>>>>>>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Plugging in code -- and
>>>>>>>>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems 
>>>>>>>>>>>>>>>>>>>>>>>>>>> inherently specialized to me
>>>>>>>>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the
>>>>>>>>>>>>>>>>>>>>>>>>>>> views? I feel we can say exactly the same thing for 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg views, but yet
>>>>>>>>>>>>>>>>>>>>>>>>>>> we have Iceberg multi-dialect views implemented. 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it sounds like we
>>>>>>>>>>>>>>>>>>>>>>>>>>> are trying to draw a line between SQL vs other 
>>>>>>>>>>>>>>>>>>>>>>>>>>> programming language as
>>>>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type of 
>>>>>>>>>>>>>>>>>>>>>>>>>>> code, and we are already
>>>>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different code 
>>>>>>>>>>>>>>>>>>>>>>>>>>> dialects to an
>>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects like 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Coral, Substrait), which
>>>>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of representation of 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg view. I think
>>>>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if 
>>>>>>>>>>>>>>>>>>>>>>>>>>> developed.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support
>>>>>>>>>>>>>>>>>>>>>>>>>>> is a good idea, even just a multi-dialect one like 
>>>>>>>>>>>>>>>>>>>>>>>>>>> view, and that can allow
>>>>>>>>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> function referenced
>>>>>>>>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect 
>>>>>>>>>>>>>>>>>>>>>>>>>>> UDF definition.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when
>>>>>>>>>>>>>>>>>>>>>>>>>>> we have the actual proposal published.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>> Robert Stupp <sn...@snazy.de> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and
>>>>>>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The 
>>>>>>>>>>>>>>>>>>>>>>>>>>> same performance concerns
>>>>>>>>>>>>>>>>>>>>>>>>>>> apply to views as well.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common
>>>>>>>>>>>>>>>>>>>>>>>>>>> base upon which engines can build, so the argument 
>>>>>>>>>>>>>>>>>>>>>>>>>>> that UDFs aren't
>>>>>>>>>>>>>>>>>>>>>>>>>>> practical, because engines are different, is 
>>>>>>>>>>>>>>>>>>>>>>>>>>> probably only a temporary
>>>>>>>>>>>>>>>>>>>>>>>>>>> concern.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should
>>>>>>>>>>>>>>>>>>>>>>>>>>> also try to tackle the idea to make views portable, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> which is conceptually
>>>>>>>>>>>>>>>>>>>>>>>>>>> not that much different from portable UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a
>>>>>>>>>>>>>>>>>>>>>>>>>>> negative touch to the idea of having UDFs in 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, especially not in
>>>>>>>>>>>>>>>>>>>>>>>>>>> this early stage.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's
>>>>>>>>>>>>>>>>>>>>>>>>>>> a good idea to add UDFs tracked by Iceberg 
>>>>>>>>>>>>>>>>>>>>>>>>>>> catalogs. I think that Iceberg
>>>>>>>>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> like tables of data.
>>>>>>>>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of 
>>>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, I
>>>>>>>>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines 
>>>>>>>>>>>>>>>>>>>>>>>>>>> are implemented so
>>>>>>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially 
>>>>>>>>>>>>>>>>>>>>>>>>>>> custom user-supplied code
>>>>>>>>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be 
>>>>>>>>>>>>>>>>>>>>>>>>>>> part of the engines'
>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you
>>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very 
>>>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to
>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models 
>>>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge
>>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM
>>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge
>>>>>>>>>>>>>>>>>>>>>>>>>>> the community interest in storing the Versioned SQL 
>>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs in Iceberg.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec
>>>>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in Iceberg 
>>>>>>>>>>>>>>>>>>>>>>>>>>> (inspired by view spec).
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly
>>>>>>>>>>>>>>>>>>>>>>>>>>> to views in that they are associated with tables, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> but they can accept
>>>>>>>>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even 
>>>>>>>>>>>>>>>>>>>>>>>>>>> function as inline expressions.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Trino, Snowflake, Databricks Spark supports SQL 
>>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs at catalog level [1].
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can
>>>>>>>>>>>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the
>>>>>>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the 
>>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs written by other
>>>>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer).
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this
>>>>>>>>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> and we're eager to
>>>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF 
>>>>>>>>>>>>>>>>>>>>>>>>>>> specification.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun
>>>>>>>>>>>>>>>>>>>>>>>>>>> drafting a specification to propose to the 
>>>>>>>>>>>>>>>>>>>>>>>>>>> community.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on
>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio -
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino -
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake -
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks -
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>>>>>>>>>>>> Databricks
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>

Reply via email to