Hi folks, thanks for joining today’s UDF sync. We covered the UDF metadata structure, captured in this doc: https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing .
We also discussed a way to avoid copying every overload into the new metadata JSON when creating a new version. One of ideas is to introduce a global version array, this is not yet reflected in the doc, but I’ll update it shortly. Other key points: - The latest UDF version will typically be used in most scenarios, but engines retain the flexibility to choose which version to execute. - Keeping the version while referring to an UDF probably isn't a good idea. Users are responsible for updating downstream views if they reference older UDF versions. You can watch the recording here: https://www.youtube.com/watch?v=6ResT-ODelI&ab_channel=ApacheIceberg Yufei On Mon, Aug 25, 2025 at 6:36 PM Yufei Gu <flyrain...@gmail.com> wrote: > Hi folks, thanks for attending today’s UDF sync. In general, we discussed > the UDF metadata structure, captured at this doc( > https://docs.google.com/document/d/1khPKL6zvWjYc5Is8HeVau6sff8FD-jNc2eLKXgit3X8/edit?usp=sharing > ). Here is the detailed summary: > > 1. Each UDF overload has its own return type. e.g., `add(int, int)` > returns `int`, while `add(long, long)` returns `long` > 2. Return type should be explicitly specified, no implicit or > statement-based return type inference should be allowed. > 3. Adding explicit properties like deterministic, doc properties at > the overload level. > 4. Adding property “secure” at the top level. > 5. Introducing a dedicated signature definitions section to centralize > metadata (Function parameters, Return type, Parameter descriptions). Each > overload would reference a signature definition by ID. This decoupling > allows signature-related updates (like modifying parameter descriptions) > without requiring a new UDF version, similar to how updating a table schema > doesn’t create a new snapshot. > 6. Whether to have versioned open properties or not. Versioned > properties can lead to unnecessary copying of a bag of properties into each > version, while it provides a clear history of properties for any future > debugging and understanding of the UDF behavior at a specific point in > time. > > Watch the recording here, > https://www.youtube.com/watch?v=p7CvuGZKLSo&list=PLkifVhhWtccwzc3oRWjy5XiYJl0R6kdQL > > Yufei > > > On Thu, Aug 21, 2025 at 4:18 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> Hi everyone, here’s the summary from our last sync on 8/11. Apologies for >> the delay! >> >> - One UDF entity for all overloads >> - We agreed to combine overloads with the same name into a single >> UDF entity, which shares a common metadata.json file. >> - Listing UDFs will return a list of UDF names, not a list of >> individual signatures. >> - Loading a UDF by name will return all of its overloads. >> - Versioning Strategy >> - A global version number will track changes across the entire UDF >> entity, it increments monolithically. >> - Each overload will also maintain its own version (e.g., >> updated_at_version) to trace changes specific to that overload. >> - For simplicity, the load API will not support argument-based >> filtering in the initial release. It will always return all overloads for >> a >> given UDF name, overload-level loading is not supported at this stage. >> >> Watch the recording here, >> https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view >> >> Yufei >> >> >> On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> To recap and add my thoughts, we want to support UDFs with multiple >>> signatures under the same name, which can serve both overload-aware and >>> overload-naive engines. >>> >>> Per my investigation[1], most engines support overloading by arguments >>> and allow implicit conversions like numeric widening (e.g., INT → >>> BIGINT/FLOAT). The resolution approach causes issues like silent behavior >>> change. Here is an example: >>> >>> - Initially, only foo(DOUBLE) exists. >>> - foo(42::INT) widens INT → DOUBLE and runs expected code. >>> - Later: malicious user creates foo(BIGINT). >>> - Engine’s best-match resolution now binds the same call to the new >>> overload, changing behavior without modifying the query. >>> >>> To mitigate this issue, we have to choose between these two access >>> control models: >>> >>> 1. Model A – Name-Level ACL: Grants apply to all overloads of a >>> function name. >>> 2. Model B – Signature-Level ACL: Grants tied to specific signatures. >>> >>> The general recommendation is to adopt *Model A.* It trades some >>> precision for safety and simplicity, while eliminating the silent behavior >>> change problem. More details are in this doc[1]. >>> >>> 1. >>> https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0 >>> >>> Yufei >>> >>> >>> On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Thanks to everyone who joined the sync. >>>> Here is the meeting recording: >>>> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view >>>> >>>> Summary: >>>> We have discussed how to define function identifiers (should also >>>> handle function overloading). Ryan suggested that we should check how Spark >>>> does it. We can refer to functions using an identifier and then bind the >>>> different signatures to it. So that access policies can be applied per >>>> identifier. This is also linked to how we want to version the functions >>>> when overloading is supported. >>>> >>>> I will check more about this and update the proposal doc. >>>> >>>> Please check/subscribe to the dev events calendar for the next >>>> meeting link (Aug 11). >>>> >>>> - Ajantha >>>> >>>> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <kevinjq...@apache.org> >>>> wrote: >>>> >>>>> Hi Ajantha, >>>>> >>>>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events" >>>>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but i'll >>>>> be at this one. >>>>> >>>>> Best, >>>>> Kevin Liu >>>>> >>>>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <ajanthab...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey everyone, >>>>>> >>>>>> No one joined the sync today. I came to know that Yufei is on >>>>>> holiday, and Ryan and others couldn't make it, similar to the last sync. >>>>>> It >>>>>> seems Yufei might have forgotten to transfer meeting ownership as well, >>>>>> as >>>>>> new members needed admin approval and couldn't join automatically this >>>>>> week. Also, I can understand it is summer holiday season for many. >>>>>> >>>>>> I've updated the function signature schema and other open points. I >>>>>> believe we're very close to the final version of the spec. A meeting is >>>>>> indeed necessary to finalize this, but we don't have to wait for it to >>>>>> finish the review process. We had many meetings on this in the past >>>>>> already. So, please review the document at your earliest convenience. If >>>>>> we >>>>>> agree on the spec by next week, I can raise a PR. >>>>>> >>>>>> - Ajantha >>>>>> >>>>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <flyrain...@gmail.com> wrote: >>>>>> >>>>>>> I’d propose to move the field `properties` from a top level field to >>>>>>> a field inside “version” along with a representation, so that properties >>>>>>> are versioned. A property like “deterministic” could change along with >>>>>>> representation over time. For example, we need to change “deterministic” >>>>>>> from true to false in case of adding a non-deterministic SQL >>>>>>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback >>>>>>> won't >>>>>>> be safe. >>>>>>> >>>>>>> That said, it's still an open question whether we need any >>>>>>> non-versioned properties. We can introduce them later if a use case >>>>>>> arises. >>>>>>> >>>>>>> Yufei >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for the summary, Ajantha! >>>>>>>> >>>>>>>> I’d prefer to keep the signature list separate from the >>>>>>>> representation history. Here are reasons: >>>>>>>> >>>>>>>> 1. Each version still enforces a single signature. Although the >>>>>>>> signatures array is global to the UDF, each version references just >>>>>>>> one >>>>>>>> signature ID. Rollbacks to historical versions remain safe. >>>>>>>> 2. We’ve separated the less frequently changing component >>>>>>>> (signatures) from the more dynamic one (representations) to reduce >>>>>>>> metadata >>>>>>>> file size. >>>>>>>> 3. Since signatures use Iceberg data types, they should remain >>>>>>>> unaffected by multi-dialect representation differences. >>>>>>>> >>>>>>>> Yufei >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat < >>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>> Here is the meeting recording: >>>>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing >>>>>>>>> >>>>>>>>> Summary: >>>>>>>>> We have discussed the action items from the last sync (*see >>>>>>>>> Appendix C* in the proposal doc) >>>>>>>>> >>>>>>>>> - Function overloading: Supported by few of the engines and in >>>>>>>>> the roadmaps of many engines. Iceberg will support it. We will >>>>>>>>> maintain the >>>>>>>>> `FunctionIdentifier` (extends `TableIdentifer` but also have a >>>>>>>>> member >>>>>>>>> containing the function argument's type list). And all operations >>>>>>>>> like >>>>>>>>> load, rename, list, create and drop are based on >>>>>>>>> `FunctionIdentifier`. >>>>>>>>> - Secure UDF: If we store it as a property in a bag, we need >>>>>>>>> to standardize the property name. Iceberg encryption may be >>>>>>>>> orthogonal to >>>>>>>>> this discussion. >>>>>>>>> - UDF with multi statement and procedural bodies are supported >>>>>>>>> by some engines. Iceberg will support it. Store the body as it is >>>>>>>>> while >>>>>>>>> creating function by the engine. >>>>>>>>> >>>>>>>>> new discussions around >>>>>>>>> >>>>>>>>> - Standardizing the property names (deterministic, secure). >>>>>>>>> - About the rename function. >>>>>>>>> - Replace function. To check upto what level replace is >>>>>>>>> supported (considering function overloading) . >>>>>>>>> - Signature should be associated with representation? >>>>>>>>> >>>>>>>>> I think we are close on the spec. Please review the proposal >>>>>>>>> >>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing> >>>>>>>>> . >>>>>>>>> >>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>> >>>>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>>>> Google Meet joining info >>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>> >>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Can it be handled by Iceberg encryption? If the whole metadata is >>>>>>>>>> encrypted, we don't have to worry about just hiding the UDF body? >>>>>>>>>> Let us >>>>>>>>>> discuss more on the sync today. >>>>>>>>>> >>>>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Yes, hiding the definition and disabling pushdown are >>>>>>>>>>> required.We will need a named key(e.g., secure) somewhere, no >>>>>>>>>>> matter if it >>>>>>>>>>> is a top level property or a key as a part of the UDF properties. >>>>>>>>>>> So that >>>>>>>>>>> both UDF creator and consumer can recognize it. >>>>>>>>>>> >>>>>>>>>>> Yufei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks for the extra detail. What do you think the spec would >>>>>>>>>>>> require? Would it require hiding the UDF definition from users and >>>>>>>>>>>> require >>>>>>>>>>>> specific pushdown cases be disabled? The use cases seem valid, but >>>>>>>>>>>> I'm >>>>>>>>>>>> trying to understand the requirements this places on engines and >>>>>>>>>>>> why it >>>>>>>>>>>> needs to be part of the spec, rather than part of the properties >>>>>>>>>>>> of the UDF. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ryan, >>>>>>>>>>>>> >>>>>>>>>>>>> Here are the main use cases for secure UDFs: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. >>>>>>>>>>>>> >>>>>>>>>>>>> Hiding UDF Definitions: This includes concealing the UDF >>>>>>>>>>>>> body and details like the list of imports, some of them aren’t >>>>>>>>>>>>> applicable >>>>>>>>>>>>> to SQL UDFs. >>>>>>>>>>>>> 2. >>>>>>>>>>>>> >>>>>>>>>>>>> Sandboxed Execution: Ensuring the UDF runs in an isolated >>>>>>>>>>>>> environment. Again, this typically doesn’t apply to SQL UDFs. >>>>>>>>>>>>> 3. >>>>>>>>>>>>> >>>>>>>>>>>>> Preventing Data Leakage at Execution Time: For example, >>>>>>>>>>>>> secure UDFs may disable certain optimizations—such as >>>>>>>>>>>>> predicate pushdown—to >>>>>>>>>>>>> avoid exposing sensitive data indirectly. [1] >>>>>>>>>>>>> >>>>>>>>>>>>> Given these scenarios, I agree with your point that the secure >>>>>>>>>>>>> flag is primarily an instruction to the engine to behave >>>>>>>>>>>>> differently. While >>>>>>>>>>>>> it's largely an engine-side behavior, we still need to include >>>>>>>>>>>>> this flag in >>>>>>>>>>>>> the UDF definition to indicate whether a UDF is secure, especially >>>>>>>>>>>>> considering the perf penalty introduced by scenario #3. We should >>>>>>>>>>>>> clearly >>>>>>>>>>>>> recommend that users avoid marking UDFs as secure unless it's >>>>>>>>>>>>> truly >>>>>>>>>>>>> necessary. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown >>>>>>>>>>>>> Yufei >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Yufei, could you make the argument for supporting a "secure" >>>>>>>>>>>>>> UDF? What use case are you addressing and what specifically >>>>>>>>>>>>>> changes about >>>>>>>>>>>>>> how the UDF is handled? If the idea is to hide the UDF >>>>>>>>>>>>>> definition, do we >>>>>>>>>>>>>> need to include it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this would be a signal to a "trusted engine". When >>>>>>>>>>>>>> the engine interacts with the catalog it sends authorization >>>>>>>>>>>>>> information >>>>>>>>>>>>>> about itself in addition to the user that it is acting on behalf >>>>>>>>>>>>>> of. That >>>>>>>>>>>>>> way the catalog knows that the secure UDF can be sent to the >>>>>>>>>>>>>> engine and >>>>>>>>>>>>>> won't be shown to the user. The majority of this logic is on the >>>>>>>>>>>>>> REST >>>>>>>>>>>>>> server side, and the only part that is communicated to the >>>>>>>>>>>>>> client is the >>>>>>>>>>>>>> request not to show the UDF to the user, right? In that case >>>>>>>>>>>>>> should this be >>>>>>>>>>>>>> a property rather than part of the definition? Even if we state >>>>>>>>>>>>>> that the >>>>>>>>>>>>>> client "must" suppress the UDF definition, it's really just a >>>>>>>>>>>>>> request. Only >>>>>>>>>>>>>> trusted engines can be passed the UDF definition, so a spec >>>>>>>>>>>>>> requirement to >>>>>>>>>>>>>> suppress the definition isn't very meaningful. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu < >>>>>>>>>>>>>> flyrain...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the summary, Ajantha! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether >>>>>>>>>>>>>>> those statements run within a single transaction should be >>>>>>>>>>>>>>> treated as an >>>>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the >>>>>>>>>>>>>>> expectation, >>>>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even if >>>>>>>>>>>>>>> a UDF >>>>>>>>>>>>>>> declares itself transactional, the engine may or may not >>>>>>>>>>>>>>> enforce it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF” >>>>>>>>>>>>>>> option supported by some engines[1], so the body and any >>>>>>>>>>>>>>> sensitive details >>>>>>>>>>>>>>> stay hidden from callers? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yufei >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat < >>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>>>>>>>> Here is the meeting recording: >>>>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing >>>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - We have gone through the SQL UDF syntax supported by >>>>>>>>>>>>>>>> different engines (Snowflake, databricks, Dremio, Trino, >>>>>>>>>>>>>>>> OSS spark 4.0). >>>>>>>>>>>>>>>> - Each engine uses its own block separator, like $$ or >>>>>>>>>>>>>>>> '' or none. Action item was to check whether engines support >>>>>>>>>>>>>>>> multi-statement (transactional) UDF bodies. >>>>>>>>>>>>>>>> - Discussed about function overloading. Need to check >>>>>>>>>>>>>>>> whether these engines support function overloading for SQL >>>>>>>>>>>>>>>> UDFs. Postgres >>>>>>>>>>>>>>>> supports it! If yes, need to adopt the spec to handle it. >>>>>>>>>>>>>>>> - Started online spec review and discussed the >>>>>>>>>>>>>>>> deterministic flag and concluded that we keep the >>>>>>>>>>>>>>>> independent fields (like >>>>>>>>>>>>>>>> deterministic) in spec only if the majority of engines >>>>>>>>>>>>>>>> supports it. Else it >>>>>>>>>>>>>>>> will be passed in a property bag (engine specific). And it >>>>>>>>>>>>>>>> is the engine's >>>>>>>>>>>>>>>> responsibility to honor those optional properties. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Feel free to review the current proposal document here >>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: >>>>>>>>>>>>>>>> America/Los_Angeles >>>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat < >>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>>>>>>>>> Here is the meeting recording: >>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We discussed including Python support; the majority >>>>>>>>>>>>>>>>> agreed *not to* (see recording for details). >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No strong opposition to versioning — it will be >>>>>>>>>>>>>>>>> included to support change tracking and similar use cases. >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Suggestions were made to document how each catalog >>>>>>>>>>>>>>>>> resolves UDFs, similar to views and tables. >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We agreed not to deviate from the existing table/view >>>>>>>>>>>>>>>>> spec — e.g., location will remain *required* for >>>>>>>>>>>>>>>>> cross-catalog compatibility. >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We also discussed a bit about view interoperability as >>>>>>>>>>>>>>>>> the same things are applicable here. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Feel free to review the proposal document >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> >>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>> With the current scope, it is similar to the view/table >>>>>>>>>>>>>>>>> spec now. >>>>>>>>>>>>>>>>> Final spec will be put to review and vote once it is >>>>>>>>>>>>>>>>> ready. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone: >>>>>>>>>>>>>>>>> America/Los_Angeles >>>>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu < >>>>>>>>>>>>>>>>> flyrain...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi folks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the >>>>>>>>>>>>>>>>>> UDF project. Everyone’s welcome to drop in and share ideas! >>>>>>>>>>>>>>>>>> Here is the >>>>>>>>>>>>>>>>>> meeting link: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Iceberg UDF sync >>>>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am >>>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles >>>>>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yufei >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Update on the progress. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss >>>>>>>>>>>>>>>>>>> the UDF proposal. We covered several key points, though >>>>>>>>>>>>>>>>>>> some are still open >>>>>>>>>>>>>>>>>>> for further discussion: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for >>>>>>>>>>>>>>>>>>> UDFs at this stage? We explored the possibility of >>>>>>>>>>>>>>>>>>> simplifying the >>>>>>>>>>>>>>>>>>> specification by avoiding view replication, and potentially >>>>>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>>>>> versioning support later. UDTFs, being a superset of views >>>>>>>>>>>>>>>>>>> in some ways, >>>>>>>>>>>>>>>>>>> may not require versioning initially. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not >>>>>>>>>>>>>>>>>>> support vararg syntax in CREATE FUNCTION, Iceberg UDFs >>>>>>>>>>>>>>>>>>> could represent such arguments as lists when supported by >>>>>>>>>>>>>>>>>>> the engine. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t >>>>>>>>>>>>>>>>>>> support generic types (e.g., object), we can only map >>>>>>>>>>>>>>>>>>> engine-specific types to Iceberg types. As a result, >>>>>>>>>>>>>>>>>>> generic data types >>>>>>>>>>>>>>>>>>> will not be supported in the initial version. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language >>>>>>>>>>>>>>>>>>> for SQL UDFs seems promising, especially given its >>>>>>>>>>>>>>>>>>> potential to resolve >>>>>>>>>>>>>>>>>>> interoperability challenges. Some engines, however, require >>>>>>>>>>>>>>>>>>> platform >>>>>>>>>>>>>>>>>>> version and package dependency details to execute Python >>>>>>>>>>>>>>>>>>> code—this should >>>>>>>>>>>>>>>>>>> be captured in the specification. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Next Steps* >>>>>>>>>>>>>>>>>>> I will update the proposal document with two primary UDF >>>>>>>>>>>>>>>>>>> use cases: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Policy exchange between engines >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> UDTF as a superset of view functionality >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The update will include corresponding syntax examples in >>>>>>>>>>>>>>>>>>> both SQL and Python, and detail how each use case is >>>>>>>>>>>>>>>>>>> represented in Iceberg >>>>>>>>>>>>>>>>>>> metadata. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more >>>>>>>>>>>>>>>>>>> interested participants) to continue refining and >>>>>>>>>>>>>>>>>>> finalizing the UDF >>>>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the >>>>>>>>>>>>>>>>>>>> previous comments. Additionally, I've included the SQL UDF >>>>>>>>>>>>>>>>>>>> syntax supported >>>>>>>>>>>>>>>>>>>> by various vendors, including Dremio, Snowflake, >>>>>>>>>>>>>>>>>>>> Databricks, and Trino. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper >>>>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, >>>>>>>>>>>>>>>>>>>> especially with the >>>>>>>>>>>>>>>>>>>> renewed interest from the community. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was >>>>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and >>>>>>>>>>>>>>>>>>>>> adding endpoints for >>>>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better >>>>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for >>>>>>>>>>>>>>>>>>>>> fine-grained access >>>>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond just >>>>>>>>>>>>>>>>>>>>> versioned and >>>>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many >>>>>>>>>>>>>>>>>>>>> vendors are interested >>>>>>>>>>>>>>>>>>>>> in this feature. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d >>>>>>>>>>>>>>>>>>>>> like to take ownership of this effort and revive the >>>>>>>>>>>>>>>>>>>>> work. I'll be >>>>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and will >>>>>>>>>>>>>>>>>>>>> share an updated >>>>>>>>>>>>>>>>>>>>> proposal by next week. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg! >>>>>>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be >>>>>>>>>>>>>>>>>>>>>> SQL. It merely does not specify (in this revision) how >>>>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>> representations are to be written. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type >>>>>>>>>>>>>>>>>>>>>> in the "Representations" section). >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue >>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the >>>>>>>>>>>>>>>>>>>>>>> spec. It leaves a way for future versions to add >>>>>>>>>>>>>>>>>>>>>>> different representations >>>>>>>>>>>>>>>>>>>>>>> later, but only SQL is supported. That was also the >>>>>>>>>>>>>>>>>>>>>>> feedback to my initial >>>>>>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL >>>>>>>>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring >>>>>>>>>>>>>>>>>>>>>>>> SQL in examples... It >>>>>>>>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong < >>>>>>>>>>>>>>>>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this >>>>>>>>>>>>>>>>>>>>>>>>> proposal focuses on SQL-based engines, while >>>>>>>>>>>>>>>>>>>>>>>>> Python-based systems often >>>>>>>>>>>>>>>>>>>>>>>>> work with data frames. Adding imperative languages >>>>>>>>>>>>>>>>>>>>>>>>> like Python would make >>>>>>>>>>>>>>>>>>>>>>>>> this proposal more inclusive. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before in this thread >>>>>>>>>>>>>>>>>>>>>>>>>> [1] i read >>>>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to >>>>>>>>>>>>>>>>>>>>>>>>>> share among different engines." >>>>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully >>>>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design addresses >>>>>>>>>>>>>>>>>>>>>>>>>> shareability between the >>>>>>>>>>>>>>>>>>>>>>>>>> engines though. >>>>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>>>>>>>>>>>>> Piotr >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa >>>>>>>>>>>>>>>>>>>>>>>>>> <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created >>>>>>>>>>>>>>>>>>>>>>>>>>> functions shareable >>>>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in >>>>>>>>>>>>>>>>>>>>>>>>>>> imperative code? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. >>>>>>>>>>>>>>>>>>>>>>>>>>> The Iceberg UDFs are an interesting idea! >>>>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created >>>>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines? >>>>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement >>>>>>>>>>>>>>>>>>>>>>>>>>> look like in e..g Spark or Trino? >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > Best >>>>>>>>>>>>>>>>>>>>>>>>>>> > Piotr >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added >>>>>>>>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have >>>>>>>>>>>>>>>>>>>>>>>>>>> a design doc that covers >>>>>>>>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the >>>>>>>>>>>>>>>>>>>>>>>>>>> choice to enumerate all >>>>>>>>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing >>>>>>>>>>>>>>>>>>>>>>>>>>> generics and varargs. >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function >>>>>>>>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a >>>>>>>>>>>>>>>>>>>>>>>>>>> discussion of the >>>>>>>>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think >>>>>>>>>>>>>>>>>>>>>>>>>>> it would also be very >>>>>>>>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for >>>>>>>>>>>>>>>>>>>>>>>>>>> this included in the >>>>>>>>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it >>>>>>>>>>>>>>>>>>>>>>>>>>> solves those use cases >>>>>>>>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs. >>>>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is >>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For >>>>>>>>>>>>>>>>>>>>>>>>>>> example, using string IDs >>>>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer. >>>>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec >>>>>>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. >>>>>>>>>>>>>>>>>>>>>>>>>>> It requires readers to >>>>>>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the spec from adding >>>>>>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other >>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg specs only require >>>>>>>>>>>>>>>>>>>>>>>>>>> a version change to introduce forward-incompatible >>>>>>>>>>>>>>>>>>>>>>>>>>> changes and I think that >>>>>>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion. >>>>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple >>>>>>>>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear >>>>>>>>>>>>>>>>>>>>>>>>>>> how to encode them >>>>>>>>>>>>>>>>>>>>>>>>>>> because a version is associated with a single >>>>>>>>>>>>>>>>>>>>>>>>>>> function signature. >>>>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating >>>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that >>>>>>>>>>>>>>>>>>>>>>>>>>> the metadata proposed >>>>>>>>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases. >>>>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows >>>>>>>>>>>>>>>>>>>>>>>>>>> a SELECT statement and it isn’t clear how this is >>>>>>>>>>>>>>>>>>>>>>>>>>> distinct from a view >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on >>>>>>>>>>>>>>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review >>>>>>>>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next >>>>>>>>>>>>>>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a >>>>>>>>>>>>>>>>>>>>>>>>>>> look at the proposal >>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin >>>>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an >>>>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some >>>>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine >>>>>>>>>>>>>>>>>>>>>>>>>>> tuned. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be >>>>>>>>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was >>>>>>>>>>>>>>>>>>>>>>>>>>> directly linked in the >>>>>>>>>>>>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't >>>>>>>>>>>>>>>>>>>>>>>>>>> get any review on the proposal. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far >>>>>>>>>>>>>>>>>>>>>>>>>>> (from Benny). >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the >>>>>>>>>>>>>>>>>>>>>>>>>>> proposal. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the >>>>>>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa >>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant >>>>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here >>>>>>>>>>>>>>>>>>>>>>>>>>> are some examples of >>>>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2): >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a >>>>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data flow/data >>>>>>>>>>>>>>>>>>>>>>>>>>> pipeline API instead of >>>>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also possible >>>>>>>>>>>>>>>>>>>>>>>>>>> in the very long run :) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye >>>>>>>>>>>>>>>>>>>>>>>>>>> <yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in >>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a >>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some >>>>>>>>>>>>>>>>>>>>>>>>>>> long term opportunities in this case. Consider you >>>>>>>>>>>>>>>>>>>>>>>>>>> register a Spark temp >>>>>>>>>>>>>>>>>>>>>>>>>>> view as some sort of data frame read, then it could >>>>>>>>>>>>>>>>>>>>>>>>>>> still be resolved to a >>>>>>>>>>>>>>>>>>>>>>>>>>> Spark plan that is representable by an intermediate >>>>>>>>>>>>>>>>>>>>>>>>>>> representation. But I >>>>>>>>>>>>>>>>>>>>>>>>>>> agree this gets very complicated very soon, and >>>>>>>>>>>>>>>>>>>>>>>>>>> just having the case (1) >>>>>>>>>>>>>>>>>>>>>>>>>>> covered would already be a huge step forward. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny >>>>>>>>>>>>>>>>>>>>>>>>>>> Chow <btc...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a >>>>>>>>>>>>>>>>>>>>>>>>>>> tabular SQL UDF can be used to build a >>>>>>>>>>>>>>>>>>>>>>>>>>> parameterized view. So, there's >>>>>>>>>>>>>>>>>>>>>>>>>>> definitely a lot in common between UDFs and views. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa >>>>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about >>>>>>>>>>>>>>>>>>>>>>>>>>> what is perceived as a "UDF". There are 2 flavors: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by >>>>>>>>>>>>>>>>>>>>>>>>>>> the user whose definition is a composition of other >>>>>>>>>>>>>>>>>>>>>>>>>>> built-in functions/SQL >>>>>>>>>>>>>>>>>>>>>>>>>>> expressions. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in >>>>>>>>>>>>>>>>>>>>>>>>>>> imperative function according to a >>>>>>>>>>>>>>>>>>>>>>>>>>> Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's >>>>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I think >>>>>>>>>>>>>>>>>>>>>>>>>>> those have more analogy to >>>>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is not >>>>>>>>>>>>>>>>>>>>>>>>>>> practical to maintain by >>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are around >>>>>>>>>>>>>>>>>>>>>>>>>>> (1), and may be worth >>>>>>>>>>>>>>>>>>>>>>>>>>> evaluating. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM >>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you >>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very >>>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to >>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models >>>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge >>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports >>>>>>>>>>>>>>>>>>>>>>>>>>> SQL representations of UDFs (similar to views as >>>>>>>>>>>>>>>>>>>>>>>>>>> shared by the reference >>>>>>>>>>>>>>>>>>>>>>>>>>> links above), the complexity involved will be >>>>>>>>>>>>>>>>>>>>>>>>>>> similar to managing views. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for >>>>>>>>>>>>>>>>>>>>>>>>>>> your input. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the >>>>>>>>>>>>>>>>>>>>>>>>>>> draft spec (inspired by the view spec) this week to >>>>>>>>>>>>>>>>>>>>>>>>>>> facilitate further >>>>>>>>>>>>>>>>>>>>>>>>>>> discussions. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM >>>>>>>>>>>>>>>>>>>>>>>>>>> Jack Ye <yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have >>>>>>>>>>>>>>>>>>>>>>>>>>> a common set of functions across engines, I don't >>>>>>>>>>>>>>>>>>>>>>>>>>> see how that is practical >>>>>>>>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. >>>>>>>>>>>>>>>>>>>>>>>>>>> Plugging in code -- and >>>>>>>>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems >>>>>>>>>>>>>>>>>>>>>>>>>>> inherently specialized to me >>>>>>>>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the >>>>>>>>>>>>>>>>>>>>>>>>>>> views? I feel we can say exactly the same thing for >>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg views, but yet >>>>>>>>>>>>>>>>>>>>>>>>>>> we have Iceberg multi-dialect views implemented. >>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it sounds like we >>>>>>>>>>>>>>>>>>>>>>>>>>> are trying to draw a line between SQL vs other >>>>>>>>>>>>>>>>>>>>>>>>>>> programming language as >>>>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type of >>>>>>>>>>>>>>>>>>>>>>>>>>> code, and we are already >>>>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different code >>>>>>>>>>>>>>>>>>>>>>>>>>> dialects to an >>>>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects like >>>>>>>>>>>>>>>>>>>>>>>>>>> Coral, Substrait), which >>>>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of representation of >>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg view. I think >>>>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if >>>>>>>>>>>>>>>>>>>>>>>>>>> developed. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support >>>>>>>>>>>>>>>>>>>>>>>>>>> is a good idea, even just a multi-dialect one like >>>>>>>>>>>>>>>>>>>>>>>>>>> view, and that can allow >>>>>>>>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a >>>>>>>>>>>>>>>>>>>>>>>>>>> function referenced >>>>>>>>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect >>>>>>>>>>>>>>>>>>>>>>>>>>> UDF definition. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when >>>>>>>>>>>>>>>>>>>>>>>>>>> we have the actual proposal published. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM >>>>>>>>>>>>>>>>>>>>>>>>>>> Robert Stupp <sn...@snazy.de> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and >>>>>>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The >>>>>>>>>>>>>>>>>>>>>>>>>>> same performance concerns >>>>>>>>>>>>>>>>>>>>>>>>>>> apply to views as well. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common >>>>>>>>>>>>>>>>>>>>>>>>>>> base upon which engines can build, so the argument >>>>>>>>>>>>>>>>>>>>>>>>>>> that UDFs aren't >>>>>>>>>>>>>>>>>>>>>>>>>>> practical, because engines are different, is >>>>>>>>>>>>>>>>>>>>>>>>>>> probably only a temporary >>>>>>>>>>>>>>>>>>>>>>>>>>> concern. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should >>>>>>>>>>>>>>>>>>>>>>>>>>> also try to tackle the idea to make views portable, >>>>>>>>>>>>>>>>>>>>>>>>>>> which is conceptually >>>>>>>>>>>>>>>>>>>>>>>>>>> not that much different from portable UDFs. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a >>>>>>>>>>>>>>>>>>>>>>>>>>> negative touch to the idea of having UDFs in >>>>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, especially not in >>>>>>>>>>>>>>>>>>>>>>>>>>> this early stage. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's >>>>>>>>>>>>>>>>>>>>>>>>>>> a good idea to add UDFs tracked by Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>>> catalogs. I think that Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, >>>>>>>>>>>>>>>>>>>>>>>>>>> like tables of data. >>>>>>>>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of >>>>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, I >>>>>>>>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines >>>>>>>>>>>>>>>>>>>>>>>>>>> are implemented so >>>>>>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially >>>>>>>>>>>>>>>>>>>>>>>>>>> custom user-supplied code >>>>>>>>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be >>>>>>>>>>>>>>>>>>>>>>>>>>> part of the engines' >>>>>>>>>>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you >>>>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very >>>>>>>>>>>>>>>>>>>>>>>>>>> difficult area to >>>>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models >>>>>>>>>>>>>>>>>>>>>>>>>>> without having a huge >>>>>>>>>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM >>>>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge >>>>>>>>>>>>>>>>>>>>>>>>>>> the community interest in storing the Versioned SQL >>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs in Iceberg. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec >>>>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>>> (inspired by view spec). >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly >>>>>>>>>>>>>>>>>>>>>>>>>>> to views in that they are associated with tables, >>>>>>>>>>>>>>>>>>>>>>>>>>> but they can accept >>>>>>>>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even >>>>>>>>>>>>>>>>>>>>>>>>>>> function as inline expressions. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, >>>>>>>>>>>>>>>>>>>>>>>>>>> Trino, Snowflake, Databricks Spark supports SQL >>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs at catalog level [1]. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can >>>>>>>>>>>>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the >>>>>>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the >>>>>>>>>>>>>>>>>>>>>>>>>>> UDFs written by other >>>>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer). >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this >>>>>>>>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, >>>>>>>>>>>>>>>>>>>>>>>>>>> and we're eager to >>>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF >>>>>>>>>>>>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun >>>>>>>>>>>>>>>>>>>>>>>>>>> drafting a specification to propose to the >>>>>>>>>>>>>>>>>>>>>>>>>>> community. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on >>>>>>>>>>>>>>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> -- >>>>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>> Databricks >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>