Hi everyone, here’s the summary from our last sync on 8/11. Apologies for the delay!
- One UDF entity for all overloads - We agreed to combine overloads with the same name into a single UDF entity, which shares a common metadata.json file. - Listing UDFs will return a list of UDF names, not a list of individual signatures. - Loading a UDF by name will return all of its overloads. - Versioning Strategy - A global version number will track changes across the entire UDF entity, it increments monolithically. - Each overload will also maintain its own version (e.g., updated_at_version) to trace changes specific to that overload. - For simplicity, the load API will not support argument-based filtering in the initial release. It will always return all overloads for a given UDF name, overload-level loading is not supported at this stage. Watch the recording here, https://drive.google.com/file/d/10G2HjUH2DaKSjGufEOjMu0bBuNd7sCzO/view Yufei On Fri, Aug 8, 2025 at 3:11 PM Yufei Gu <flyrain...@gmail.com> wrote: > To recap and add my thoughts, we want to support UDFs with multiple > signatures under the same name, which can serve both overload-aware and > overload-naive engines. > > Per my investigation[1], most engines support overloading by arguments and > allow implicit conversions like numeric widening (e.g., INT → > BIGINT/FLOAT). The resolution approach causes issues like silent behavior > change. Here is an example: > > - Initially, only foo(DOUBLE) exists. > - foo(42::INT) widens INT → DOUBLE and runs expected code. > - Later: malicious user creates foo(BIGINT). > - Engine’s best-match resolution now binds the same call to the new > overload, changing behavior without modifying the query. > > To mitigate this issue, we have to choose between these two access control > models: > > 1. Model A – Name-Level ACL: Grants apply to all overloads of a > function name. > 2. Model B – Signature-Level ACL: Grants tied to specific signatures. > > The general recommendation is to adopt *Model A.* It trades some > precision for safety and simplicity, while eliminating the silent behavior > change problem. More details are in this doc[1]. > > 1. > https://docs.google.com/document/d/1E8mR-vInbQ8LDa5Lv3f22i6f8sceHojnEzxEJ6s6cvc/edit?tab=t.0 > > Yufei > > > On Tue, Jul 29, 2025 at 1:07 AM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> Thanks to everyone who joined the sync. >> Here is the meeting recording: >> https://drive.google.com/file/d/1L5S6nb-C_pzBwFlClwO_sG1AVBA_ROKo/view >> >> Summary: >> We have discussed how to define function identifiers (should also handle >> function overloading). Ryan suggested that we should check how Spark does >> it. We can refer to functions using an identifier and then bind the >> different signatures to it. So that access policies can be applied per >> identifier. This is also linked to how we want to version the functions >> when overloading is supported. >> >> I will check more about this and update the proposal doc. >> >> Please check/subscribe to the dev events calendar for the next >> meeting link (Aug 11). >> >> - Ajantha >> >> On Sun, Jul 27, 2025 at 10:46 PM Kevin Liu <kevinjq...@apache.org> wrote: >> >>> Hi Ajantha, >>> >>> I see that the UDF Sync is scheduled in the "Iceberg Dev Events" >>> calendar for tomorrow 7/28 at 9AM PT. I missed the last one, but i'll >>> be at this one. >>> >>> Best, >>> Kevin Liu >>> >>> On Mon, Jul 14, 2025 at 9:22 AM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Hey everyone, >>>> >>>> No one joined the sync today. I came to know that Yufei is on holiday, >>>> and Ryan and others couldn't make it, similar to the last sync. It seems >>>> Yufei might have forgotten to transfer meeting ownership as well, as new >>>> members needed admin approval and couldn't join automatically this week. >>>> Also, I can understand it is summer holiday season for many. >>>> >>>> I've updated the function signature schema and other open points. I >>>> believe we're very close to the final version of the spec. A meeting is >>>> indeed necessary to finalize this, but we don't have to wait for it to >>>> finish the review process. We had many meetings on this in the past >>>> already. So, please review the document at your earliest convenience. If we >>>> agree on the spec by next week, I can raise a PR. >>>> >>>> - Ajantha >>>> >>>> On Thu, Jul 3, 2025 at 4:03 AM Yufei Gu <flyrain...@gmail.com> wrote: >>>> >>>>> I’d propose to move the field `properties` from a top level field to a >>>>> field inside “version” along with a representation, so that properties are >>>>> versioned. A property like “deterministic” could change along with >>>>> representation over time. For example, we need to change “deterministic” >>>>> from true to false in case of adding a non-deterministic SQL >>>>> expression/function(e.g., now()) inside an UDF. Otherwise, rollback won't >>>>> be safe. >>>>> >>>>> That said, it's still an open question whether we need any >>>>> non-versioned properties. We can introduce them later if a use case >>>>> arises. >>>>> >>>>> Yufei >>>>> >>>>> >>>>> On Wed, Jul 2, 2025 at 3:06 PM Yufei Gu <flyrain...@gmail.com> wrote: >>>>> >>>>>> Thanks for the summary, Ajantha! >>>>>> >>>>>> I’d prefer to keep the signature list separate from the >>>>>> representation history. Here are reasons: >>>>>> >>>>>> 1. Each version still enforces a single signature. Although the >>>>>> signatures array is global to the UDF, each version references just >>>>>> one >>>>>> signature ID. Rollbacks to historical versions remain safe. >>>>>> 2. We’ve separated the less frequently changing component >>>>>> (signatures) from the more dynamic one (representations) to reduce >>>>>> metadata >>>>>> file size. >>>>>> 3. Since signatures use Iceberg data types, they should remain >>>>>> unaffected by multi-dialect representation differences. >>>>>> >>>>>> Yufei >>>>>> >>>>>> >>>>>> On Mon, Jun 30, 2025 at 11:28 AM Ajantha Bhat <ajanthab...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks to everyone who joined the sync. >>>>>>> Here is the meeting recording: >>>>>>> https://drive.google.com/file/d/1FcOSbHo9ZIVeZXdUlmoG42o-chB7Q15P/view?usp=sharing >>>>>>> >>>>>>> Summary: >>>>>>> We have discussed the action items from the last sync (*see >>>>>>> Appendix C* in the proposal doc) >>>>>>> >>>>>>> - Function overloading: Supported by few of the engines and in >>>>>>> the roadmaps of many engines. Iceberg will support it. We will >>>>>>> maintain the >>>>>>> `FunctionIdentifier` (extends `TableIdentifer` but also have a member >>>>>>> containing the function argument's type list). And all operations >>>>>>> like >>>>>>> load, rename, list, create and drop are based on >>>>>>> `FunctionIdentifier`. >>>>>>> - Secure UDF: If we store it as a property in a bag, we need to >>>>>>> standardize the property name. Iceberg encryption may be orthogonal >>>>>>> to this >>>>>>> discussion. >>>>>>> - UDF with multi statement and procedural bodies are supported >>>>>>> by some engines. Iceberg will support it. Store the body as it is >>>>>>> while >>>>>>> creating function by the engine. >>>>>>> >>>>>>> new discussions around >>>>>>> >>>>>>> - Standardizing the property names (deterministic, secure). >>>>>>> - About the rename function. >>>>>>> - Replace function. To check upto what level replace is >>>>>>> supported (considering function overloading) . >>>>>>> - Signature should be associated with representation? >>>>>>> >>>>>>> I think we are close on the spec. Please review the proposal >>>>>>> >>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing> >>>>>>> . >>>>>>> >>>>>>> Details for next Iceberg UDF sync: >>>>>>> >>>>>>> *Monday, July 14 · 9:00 – 10:00am*Time zone: America/Los_Angeles >>>>>>> Google Meet joining info >>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>> >>>>>>> - Ajantha >>>>>>> >>>>>>> On Mon, Jun 30, 2025 at 9:27 PM Ajantha Bhat <ajanthab...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Can it be handled by Iceberg encryption? If the whole metadata is >>>>>>>> encrypted, we don't have to worry about just hiding the UDF body? Let >>>>>>>> us >>>>>>>> discuss more on the sync today. >>>>>>>> >>>>>>>> On Mon, Jun 30, 2025 at 9:22 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Yes, hiding the definition and disabling pushdown are required.We >>>>>>>>> will need a named key(e.g., secure) somewhere, no matter if it is a >>>>>>>>> top >>>>>>>>> level property or a key as a part of the UDF properties. So that both >>>>>>>>> UDF >>>>>>>>> creator and consumer can recognize it. >>>>>>>>> >>>>>>>>> Yufei >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 26, 2025 at 4:27 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks for the extra detail. What do you think the spec would >>>>>>>>>> require? Would it require hiding the UDF definition from users and >>>>>>>>>> require >>>>>>>>>> specific pushdown cases be disabled? The use cases seem valid, but >>>>>>>>>> I'm >>>>>>>>>> trying to understand the requirements this places on engines and why >>>>>>>>>> it >>>>>>>>>> needs to be part of the spec, rather than part of the properties of >>>>>>>>>> the UDF. >>>>>>>>>> >>>>>>>>>> On Fri, Jun 20, 2025 at 3:56 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ryan, >>>>>>>>>>> >>>>>>>>>>> Here are the main use cases for secure UDFs: >>>>>>>>>>> >>>>>>>>>>> 1. >>>>>>>>>>> >>>>>>>>>>> Hiding UDF Definitions: This includes concealing the UDF >>>>>>>>>>> body and details like the list of imports, some of them aren’t >>>>>>>>>>> applicable >>>>>>>>>>> to SQL UDFs. >>>>>>>>>>> 2. >>>>>>>>>>> >>>>>>>>>>> Sandboxed Execution: Ensuring the UDF runs in an isolated >>>>>>>>>>> environment. Again, this typically doesn’t apply to SQL UDFs. >>>>>>>>>>> 3. >>>>>>>>>>> >>>>>>>>>>> Preventing Data Leakage at Execution Time: For example, >>>>>>>>>>> secure UDFs may disable certain optimizations—such as predicate >>>>>>>>>>> pushdown—to >>>>>>>>>>> avoid exposing sensitive data indirectly. [1] >>>>>>>>>>> >>>>>>>>>>> Given these scenarios, I agree with your point that the secure >>>>>>>>>>> flag is primarily an instruction to the engine to behave >>>>>>>>>>> differently. While >>>>>>>>>>> it's largely an engine-side behavior, we still need to include this >>>>>>>>>>> flag in >>>>>>>>>>> the UDF definition to indicate whether a UDF is secure, especially >>>>>>>>>>> considering the perf penalty introduced by scenario #3. We should >>>>>>>>>>> clearly >>>>>>>>>>> recommend that users avoid marking UDFs as secure unless it's truly >>>>>>>>>>> necessary. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/pushdown-optimization#example-of-indirect-data-exposure-through-pushdown >>>>>>>>>>> Yufei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 18, 2025 at 12:32 PM Ryan Blue <rdb...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Yufei, could you make the argument for supporting a "secure" >>>>>>>>>>>> UDF? What use case are you addressing and what specifically >>>>>>>>>>>> changes about >>>>>>>>>>>> how the UDF is handled? If the idea is to hide the UDF definition, >>>>>>>>>>>> do we >>>>>>>>>>>> need to include it? >>>>>>>>>>>> >>>>>>>>>>>> I think this would be a signal to a "trusted engine". When the >>>>>>>>>>>> engine interacts with the catalog it sends authorization >>>>>>>>>>>> information about >>>>>>>>>>>> itself in addition to the user that it is acting on behalf of. >>>>>>>>>>>> That way the >>>>>>>>>>>> catalog knows that the secure UDF can be sent to the engine and >>>>>>>>>>>> won't be >>>>>>>>>>>> shown to the user. The majority of this logic is on the REST >>>>>>>>>>>> server side, >>>>>>>>>>>> and the only part that is communicated to the client is the >>>>>>>>>>>> request not to >>>>>>>>>>>> show the UDF to the user, right? In that case should this be a >>>>>>>>>>>> property >>>>>>>>>>>> rather than part of the definition? Even if we state that the >>>>>>>>>>>> client "must" >>>>>>>>>>>> suppress the UDF definition, it's really just a request. Only >>>>>>>>>>>> trusted >>>>>>>>>>>> engines can be passed the UDF definition, so a spec requirement to >>>>>>>>>>>> suppress >>>>>>>>>>>> the definition isn't very meaningful. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jun 16, 2025 at 5:42 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the summary, Ajantha! >>>>>>>>>>>>> >>>>>>>>>>>>> Multi-statement UDFs are definitely useful, but whether those >>>>>>>>>>>>> statements run within a single transaction should be treated as an >>>>>>>>>>>>> engine-level concern. The Iceberg UDF spec can spell out the >>>>>>>>>>>>> expectation, >>>>>>>>>>>>> yet the actual guarantee still depends on the runtime. Even if a >>>>>>>>>>>>> UDF >>>>>>>>>>>>> declares itself transactional, the engine may or may not enforce >>>>>>>>>>>>> it. >>>>>>>>>>>>> >>>>>>>>>>>>> One more thing: should we also introduce a “secure UDF” option >>>>>>>>>>>>> supported by some engines[1], so the body and any sensitive >>>>>>>>>>>>> details stay >>>>>>>>>>>>> hidden from callers? >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/secure-udf-procedure >>>>>>>>>>>>> >>>>>>>>>>>>> Yufei >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 16, 2025 at 12:02 PM Ajantha Bhat < >>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>>>>>> Here is the meeting recording: >>>>>>>>>>>>>> https://drive.google.com/file/d/10_Getaasv6tDMGzeZQUgcUVwCUAaFxiz/view?usp=sharing >>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - We have gone through the SQL UDF syntax supported by >>>>>>>>>>>>>> different engines (Snowflake, databricks, Dremio, Trino, OSS >>>>>>>>>>>>>> spark 4.0). >>>>>>>>>>>>>> - Each engine uses its own block separator, like $$ or '' >>>>>>>>>>>>>> or none. Action item was to check whether engines support >>>>>>>>>>>>>> multi-statement >>>>>>>>>>>>>> (transactional) UDF bodies. >>>>>>>>>>>>>> - Discussed about function overloading. Need to check >>>>>>>>>>>>>> whether these engines support function overloading for SQL >>>>>>>>>>>>>> UDFs. Postgres >>>>>>>>>>>>>> supports it! If yes, need to adopt the spec to handle it. >>>>>>>>>>>>>> - Started online spec review and discussed the >>>>>>>>>>>>>> deterministic flag and concluded that we keep the independent >>>>>>>>>>>>>> fields (like >>>>>>>>>>>>>> deterministic) in spec only if the majority of engines >>>>>>>>>>>>>> supports it. Else it >>>>>>>>>>>>>> will be passed in a property bag (engine specific). And it is >>>>>>>>>>>>>> the engine's >>>>>>>>>>>>>> responsibility to honor those optional properties. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Feel free to review the current proposal document here >>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing>. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Final spec will be put to review and vote once it is ready. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Monday, June 30 · 9:00 – 10:00am*Time zone: >>>>>>>>>>>>>> America/Los_Angeles >>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 4, 2025 at 9:00 PM Ajantha Bhat < >>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks to everyone who joined the sync. >>>>>>>>>>>>>>> Here is the meeting recording: >>>>>>>>>>>>>>> https://drive.google.com/file/d/1WItItsNs3m3-no7_qWPHftGqVNOdpw5C/view?usp=sharing >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Summary: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We discussed including Python support; the majority >>>>>>>>>>>>>>> agreed *not to* (see recording for details). >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> No strong opposition to versioning — it will be included >>>>>>>>>>>>>>> to support change tracking and similar use cases. >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Suggestions were made to document how each catalog >>>>>>>>>>>>>>> resolves UDFs, similar to views and tables. >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We agreed not to deviate from the existing table/view >>>>>>>>>>>>>>> spec — e.g., location will remain *required* for >>>>>>>>>>>>>>> cross-catalog compatibility. >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We also discussed a bit about view interoperability as >>>>>>>>>>>>>>> the same things are applicable here. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Feel free to review the proposal document >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?pli=1&tab=t.0> >>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>> With the current scope, it is similar to the view/table spec >>>>>>>>>>>>>>> now. >>>>>>>>>>>>>>> Final spec will be put to review and vote once it is >>>>>>>>>>>>>>> ready. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Details for next Iceberg UDF sync: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Monday, June 16 · 9:00 – 10:00am*Time zone: >>>>>>>>>>>>>>> America/Los_Angeles >>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, May 21, 2025 at 3:33 AM Yufei Gu < >>>>>>>>>>>>>>> flyrain...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi folks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We’ve set up a dedicated bi-weekly community sync for the >>>>>>>>>>>>>>>> UDF project. Everyone’s welcome to drop in and share ideas! >>>>>>>>>>>>>>>> Here is the >>>>>>>>>>>>>>>> meeting link: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Iceberg UDF sync >>>>>>>>>>>>>>>> Monday, June 2 · 9:00 – 10:00am >>>>>>>>>>>>>>>> Time zone: America/Los_Angeles >>>>>>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>>>>>> Video call link: https://meet.google.com/aui-czix-nbh >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yufei >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, May 16, 2025 at 10:45 AM Ajantha Bhat < >>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Update on the progress. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I had a meeting today with Yufei and Yun.zou to discuss >>>>>>>>>>>>>>>>> the UDF proposal. We covered several key points, though some >>>>>>>>>>>>>>>>> are still open >>>>>>>>>>>>>>>>> for further discussion: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> a) *UDF Versioning*: Do we truly need versioning for UDFs >>>>>>>>>>>>>>>>> at this stage? We explored the possibility of simplifying the >>>>>>>>>>>>>>>>> specification >>>>>>>>>>>>>>>>> by avoiding view replication, and potentially introducing >>>>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>>>> support later. UDTFs, being a superset of views in some ways, >>>>>>>>>>>>>>>>> may not >>>>>>>>>>>>>>>>> require versioning initially. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> b) *VarArgs Support*: While some query engines may not >>>>>>>>>>>>>>>>> support vararg syntax in CREATE FUNCTION, Iceberg UDFs >>>>>>>>>>>>>>>>> could represent such arguments as lists when supported by the >>>>>>>>>>>>>>>>> engine. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> c) *Generics in UDFs*: Since Iceberg currently doesn’t >>>>>>>>>>>>>>>>> support generic types (e.g., object), we can only map >>>>>>>>>>>>>>>>> engine-specific types to Iceberg types. As a result, generic >>>>>>>>>>>>>>>>> data types >>>>>>>>>>>>>>>>> will not be supported in the initial version. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> d) *Python Support*: Incorporating Python as a language >>>>>>>>>>>>>>>>> for SQL UDFs seems promising, especially given its potential >>>>>>>>>>>>>>>>> to resolve >>>>>>>>>>>>>>>>> interoperability challenges. Some engines, however, require >>>>>>>>>>>>>>>>> platform >>>>>>>>>>>>>>>>> version and package dependency details to execute Python >>>>>>>>>>>>>>>>> code—this should >>>>>>>>>>>>>>>>> be captured in the specification. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Next Steps* >>>>>>>>>>>>>>>>> I will update the proposal document with two primary UDF >>>>>>>>>>>>>>>>> use cases: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Policy exchange between engines >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> UDTF as a superset of view functionality >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The update will include corresponding syntax examples in >>>>>>>>>>>>>>>>> both SQL and Python, and detail how each use case is >>>>>>>>>>>>>>>>> represented in Iceberg >>>>>>>>>>>>>>>>> metadata. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We also plan to set up regular syncs (open to more >>>>>>>>>>>>>>>>> interested participants) to continue refining and finalizing >>>>>>>>>>>>>>>>> the UDF >>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 9:16 PM Ajantha Bhat < >>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've updated the design document[1] based on the previous >>>>>>>>>>>>>>>>>> comments. Additionally, I've included the SQL UDF syntax >>>>>>>>>>>>>>>>>> supported by >>>>>>>>>>>>>>>>>> various vendors, including Dremio, Snowflake, Databricks, >>>>>>>>>>>>>>>>>> and Trino. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm happy to schedule a separate sync if a deeper >>>>>>>>>>>>>>>>>> discussion is needed. Let's keep moving forward, especially >>>>>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>>>>> renewed interest from the community. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit?usp=sharing >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Feb 13, 2025 at 11:17 PM Ajantha Bhat < >>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> During the last catalog community sync, there was >>>>>>>>>>>>>>>>>>> significant interest in storing UDFs in Iceberg and adding >>>>>>>>>>>>>>>>>>> endpoints for >>>>>>>>>>>>>>>>>>> UDF handling in the REST catalog spec. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I recently discussed this with Yufei to better >>>>>>>>>>>>>>>>>>> understand the new requirement of using UDFs for >>>>>>>>>>>>>>>>>>> fine-grained access >>>>>>>>>>>>>>>>>>> control policies. This expands the use cases beyond just >>>>>>>>>>>>>>>>>>> versioned and >>>>>>>>>>>>>>>>>>> interoperable UDFs. Additionally, I learnt that many >>>>>>>>>>>>>>>>>>> vendors are interested >>>>>>>>>>>>>>>>>>> in this feature. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Given the strong community interest and support, I’d >>>>>>>>>>>>>>>>>>> like to take ownership of this effort and revive the work. >>>>>>>>>>>>>>>>>>> I'll be >>>>>>>>>>>>>>>>>>> revisiting the document I proposed long back and will share >>>>>>>>>>>>>>>>>>> an updated >>>>>>>>>>>>>>>>>>> proposal by next week. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Looking forward to storing UDFs in Iceberg! >>>>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 2:55 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The UDF spec does not require representations to be >>>>>>>>>>>>>>>>>>>> SQL. It merely does not specify (in this revision) how >>>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>> representations are to be written. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> This seems like an easy extension (adding a new type in >>>>>>>>>>>>>>>>>>>> the "Representations" section). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 3:47 PM Ryan Blue >>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Right now, SQL is an explicit requirement of the spec. >>>>>>>>>>>>>>>>>>>>> It leaves a way for future versions to add different >>>>>>>>>>>>>>>>>>>>> representations later, >>>>>>>>>>>>>>>>>>>>> but only SQL is supported. That was also the feedback to >>>>>>>>>>>>>>>>>>>>> my initial >>>>>>>>>>>>>>>>>>>>> skepticism about how it would work to add functions. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 12:44 PM Dmitri Bourlatchkov >>>>>>>>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I do not think the spec is meant to allow only SQL >>>>>>>>>>>>>>>>>>>>>> representations, although it is certainly faviouring SQL >>>>>>>>>>>>>>>>>>>>>> in examples... It >>>>>>>>>>>>>>>>>>>>>> would be nice to add a non-SQL example, indeed. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 8, 2024 at 9:00 AM Fokko Driesprong < >>>>>>>>>>>>>>>>>>>>>> fo...@apache.org> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Coming from PyIceberg, I have concerns as this >>>>>>>>>>>>>>>>>>>>>>> proposal focuses on SQL-based engines, while >>>>>>>>>>>>>>>>>>>>>>> Python-based systems often >>>>>>>>>>>>>>>>>>>>>>> work with data frames. Adding imperative languages like >>>>>>>>>>>>>>>>>>>>>>> Python would make >>>>>>>>>>>>>>>>>>>>>>> this proposal more inclusive. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Op do 8 aug 2024 om 10:27 schreef Piotr Findeisen < >>>>>>>>>>>>>>>>>>>>>>> piotr.findei...@gmail.com>: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Walaa, thanks for asking! >>>>>>>>>>>>>>>>>>>>>>>> In the design doc linked before in this thread [1] >>>>>>>>>>>>>>>>>>>>>>>> i read >>>>>>>>>>>>>>>>>>>>>>>> "Without a common standard, the UDFs are hard to >>>>>>>>>>>>>>>>>>>>>>>> share among different engines." >>>>>>>>>>>>>>>>>>>>>>>> ("Background and Motivation" section). >>>>>>>>>>>>>>>>>>>>>>>> I agree with this statement. I don't fully >>>>>>>>>>>>>>>>>>>>>>>> understand yet how the proposed design addresses >>>>>>>>>>>>>>>>>>>>>>>> shareability between the >>>>>>>>>>>>>>>>>>>>>>>> engines though. >>>>>>>>>>>>>>>>>>>>>>>> I would use some help to understand this better. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>>>>>>>>>>> Piotr >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [1] SQL User-Defined Function Spec >>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 7 Aug 2024 at 21:14, Walaa Eldin Moustafa < >>>>>>>>>>>>>>>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Piotr, what do you mean by making user-created >>>>>>>>>>>>>>>>>>>>>>>>> functions shareable >>>>>>>>>>>>>>>>>>>>>>>>> between engines? Do you mean UDFs written in >>>>>>>>>>>>>>>>>>>>>>>>> imperative code? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen >>>>>>>>>>>>>>>>>>>>>>>>> <piotr.findei...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > Thank you Ajantha for creating this thread. The >>>>>>>>>>>>>>>>>>>>>>>>> Iceberg UDFs are an interesting idea! >>>>>>>>>>>>>>>>>>>>>>>>> > Is there a plan to make the user-created >>>>>>>>>>>>>>>>>>>>>>>>> functions sharable between the engines? >>>>>>>>>>>>>>>>>>>>>>>>> > If so, how would a CREATE FUNCTION statement >>>>>>>>>>>>>>>>>>>>>>>>> look like in e..g Spark or Trino? >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > Meanwhile, added a few comments in the doc. >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > Best >>>>>>>>>>>>>>>>>>>>>>>>> > Piotr >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > On Thu, 1 Aug 2024 at 20:50, Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>> <b...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> I just looked through the proposal and added >>>>>>>>>>>>>>>>>>>>>>>>> comments. I think it would be helpful to also have a >>>>>>>>>>>>>>>>>>>>>>>>> design doc that covers >>>>>>>>>>>>>>>>>>>>>>>>> the choices from the draft spec. For instance, the >>>>>>>>>>>>>>>>>>>>>>>>> choice to enumerate all >>>>>>>>>>>>>>>>>>>>>>>>> possible function input struts rather than allowing >>>>>>>>>>>>>>>>>>>>>>>>> generics and varargs. >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> Here’s a quick summary of my feedback: >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> I think that the choice to enumerate function >>>>>>>>>>>>>>>>>>>>>>>>> signatures is limiting. It would be nice to see a >>>>>>>>>>>>>>>>>>>>>>>>> discussion of the >>>>>>>>>>>>>>>>>>>>>>>>> trade-offs and a rationale for the choice. I think it >>>>>>>>>>>>>>>>>>>>>>>>> would also be very >>>>>>>>>>>>>>>>>>>>>>>>> helpful to have a few representative use cases for >>>>>>>>>>>>>>>>>>>>>>>>> this included in the >>>>>>>>>>>>>>>>>>>>>>>>> doc. That way the proposal can demonstrate that it >>>>>>>>>>>>>>>>>>>>>>>>> solves those use cases >>>>>>>>>>>>>>>>>>>>>>>>> with reasonable trade-offs. >>>>>>>>>>>>>>>>>>>>>>>>> >> There are a few instances where this is >>>>>>>>>>>>>>>>>>>>>>>>> inconsistent with conventions in other specs. For >>>>>>>>>>>>>>>>>>>>>>>>> example, using string IDs >>>>>>>>>>>>>>>>>>>>>>>>> rather than an integer. >>>>>>>>>>>>>>>>>>>>>>>>> >> This uses a very different model for spec >>>>>>>>>>>>>>>>>>>>>>>>> versioning than the Iceberg view and table specs. It >>>>>>>>>>>>>>>>>>>>>>>>> requires readers to >>>>>>>>>>>>>>>>>>>>>>>>> fail if there are any unknown fields, which prevents >>>>>>>>>>>>>>>>>>>>>>>>> the spec from adding >>>>>>>>>>>>>>>>>>>>>>>>> things that are fully backward-compatible. Other >>>>>>>>>>>>>>>>>>>>>>>>> Iceberg specs only require >>>>>>>>>>>>>>>>>>>>>>>>> a version change to introduce forward-incompatible >>>>>>>>>>>>>>>>>>>>>>>>> changes and I think that >>>>>>>>>>>>>>>>>>>>>>>>> this should do the same to avoid confusion. >>>>>>>>>>>>>>>>>>>>>>>>> >> It looks like the intent is to allow multiple >>>>>>>>>>>>>>>>>>>>>>>>> function signatures per verison, but it is unclear >>>>>>>>>>>>>>>>>>>>>>>>> how to encode them >>>>>>>>>>>>>>>>>>>>>>>>> because a version is associated with a single >>>>>>>>>>>>>>>>>>>>>>>>> function signature. >>>>>>>>>>>>>>>>>>>>>>>>> >> There is no review of SQL syntax for creating >>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, so this doesn’t show that >>>>>>>>>>>>>>>>>>>>>>>>> the metadata proposed >>>>>>>>>>>>>>>>>>>>>>>>> is sufficient for cross-engine use cases. >>>>>>>>>>>>>>>>>>>>>>>>> >> The example for a table-valued function shows a >>>>>>>>>>>>>>>>>>>>>>>>> SELECT statement and it isn’t clear how this is >>>>>>>>>>>>>>>>>>>>>>>>> distinct from a view >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> On Thu, Aug 1, 2024 at 3:15 AM Ajantha Bhat < >>>>>>>>>>>>>>>>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> >>> Thanks Walaa and Robert for the review on this. >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> >>> We didn't find any blocker for the spec. >>>>>>>>>>>>>>>>>>>>>>>>> >>> I will wait for a week and If no more review >>>>>>>>>>>>>>>>>>>>>>>>> comments, I will raise a PR for spec addition next >>>>>>>>>>>>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> >>> If anyone else is interested, please have a >>>>>>>>>>>>>>>>>>>>>>>>> look at the proposal >>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> >>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> >>> On Tue, Jul 16, 2024 at 1:27 PM Walaa Eldin >>>>>>>>>>>>>>>>>>>>>>>>> Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>> Hi Ajantha, >>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>> I have left some comments. It is an >>>>>>>>>>>>>>>>>>>>>>>>> interesting direction, but there might be some >>>>>>>>>>>>>>>>>>>>>>>>> details that need to be fine >>>>>>>>>>>>>>>>>>>>>>>>> tuned. >>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>> The doc is here [1] for others who might be >>>>>>>>>>>>>>>>>>>>>>>>> interested. Resharing since I do not think it was >>>>>>>>>>>>>>>>>>>>>>>>> directly linked in the >>>>>>>>>>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1BDvOfhrH0ZQiQv9eLBqeAu8k8Vjfmeql9VzIiW1F0vc/edit >>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>> On Mon, Jul 15, 2024 at 11:09 PM Ajantha Bhat >>>>>>>>>>>>>>>>>>>>>>>>> <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Hi, just another reminder since we didn't >>>>>>>>>>>>>>>>>>>>>>>>> get any review on the proposal. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> Initially proposed on June 4. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>> On Mon, Jun 24, 2024 at 4:21 PM Ajantha Bhat >>>>>>>>>>>>>>>>>>>>>>>>> <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We've only received one review so far (from >>>>>>>>>>>>>>>>>>>>>>>>> Benny). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> We would appreciate more eyes on this. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> On Tue, Jun 4, 2024 at 7:25 AM Ajantha Bhat >>>>>>>>>>>>>>>>>>>>>>>>> <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Please find the proposal link >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10432 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Google doc link is attached in the >>>>>>>>>>>>>>>>>>>>>>>>> proposal. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> And Thanks Stephen Lin for working on it. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> Hope it gives more clarity to take the >>>>>>>>>>>>>>>>>>>>>>>>> decisions and how we want to implement it. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>> On Wed, May 29, 2024 at 4:01 AM Walaa >>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks Jack. I actually meant >>>>>>>>>>>>>>>>>>>>>>>>> scalar/aggregate/table user defined functions. Here >>>>>>>>>>>>>>>>>>>>>>>>> are some examples of >>>>>>>>>>>>>>>>>>>>>>>>> what I meant in (2): >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Hive GenericUDF: >>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Trino user defined functions: >>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/develop/functions.html >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Flink user defined functions: >>>>>>>>>>>>>>>>>>>>>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/ >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Probably what you referred to is a >>>>>>>>>>>>>>>>>>>>>>>>> variation of (1) where the API is data flow/data >>>>>>>>>>>>>>>>>>>>>>>>> pipeline API instead of >>>>>>>>>>>>>>>>>>>>>>>>> SQL (e.g., Spark Scala). Yes, that is also possible >>>>>>>>>>>>>>>>>>>>>>>>> in the very long run :) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> On Tue, May 28, 2024 at 2:57 PM Jack Ye < >>>>>>>>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> > (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> I think we could still explore some long >>>>>>>>>>>>>>>>>>>>>>>>> term opportunities in this case. Consider you >>>>>>>>>>>>>>>>>>>>>>>>> register a Spark temp view as >>>>>>>>>>>>>>>>>>>>>>>>> some sort of data frame read, then it could still be >>>>>>>>>>>>>>>>>>>>>>>>> resolved to a Spark >>>>>>>>>>>>>>>>>>>>>>>>> plan that is representable by an intermediate >>>>>>>>>>>>>>>>>>>>>>>>> representation. But I agree >>>>>>>>>>>>>>>>>>>>>>>>> this gets very complicated very soon, and just having >>>>>>>>>>>>>>>>>>>>>>>>> the case (1) covered >>>>>>>>>>>>>>>>>>>>>>>>> would already be a huge step forward. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> -Jack >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> On Tue, May 28, 2024 at 1:40 PM Benny >>>>>>>>>>>>>>>>>>>>>>>>> Chow <btc...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> It's interesting to note that a tabular >>>>>>>>>>>>>>>>>>>>>>>>> SQL UDF can be used to build a parameterized view. >>>>>>>>>>>>>>>>>>>>>>>>> So, there's definitely >>>>>>>>>>>>>>>>>>>>>>>>> a lot in common between UDFs and views. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>> On Tue, May 28, 2024 at 9:53 AM Walaa >>>>>>>>>>>>>>>>>>>>>>>>> Eldin Moustafa <wa.moust...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> I think there is a disconnect about >>>>>>>>>>>>>>>>>>>>>>>>> what is perceived as a "UDF". There are 2 flavors: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (1) Functions that are defined by the >>>>>>>>>>>>>>>>>>>>>>>>> user whose definition is a composition of other >>>>>>>>>>>>>>>>>>>>>>>>> built-in functions/SQL >>>>>>>>>>>>>>>>>>>>>>>>> expressions. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> (2) Custom code written in imperative >>>>>>>>>>>>>>>>>>>>>>>>> function according to a Java/Scala/Python API, etc. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> All the examples in Ajantha's >>>>>>>>>>>>>>>>>>>>>>>>> references are pretty much from (1) and I think those >>>>>>>>>>>>>>>>>>>>>>>>> have more analogy to >>>>>>>>>>>>>>>>>>>>>>>>> views due to their SQL nature. Agree (2) is not >>>>>>>>>>>>>>>>>>>>>>>>> practical to maintain by >>>>>>>>>>>>>>>>>>>>>>>>> Iceberg, but I think Ajantha's use cases are around >>>>>>>>>>>>>>>>>>>>>>>>> (1), and may be worth >>>>>>>>>>>>>>>>>>>>>>>>> evaluating. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> On Tue, May 28, 2024 at 9:45 AM >>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we'll know more when you >>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very >>>>>>>>>>>>>>>>>>>>>>>>> difficult area to >>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models >>>>>>>>>>>>>>>>>>>>>>>>> without having a huge >>>>>>>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Assuming Iceberg initially supports >>>>>>>>>>>>>>>>>>>>>>>>> SQL representations of UDFs (similar to views as >>>>>>>>>>>>>>>>>>>>>>>>> shared by the reference >>>>>>>>>>>>>>>>>>>>>>>>> links above), the complexity involved will be similar >>>>>>>>>>>>>>>>>>>>>>>>> to managing views. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks, Ryan, Robert, and Jack, for >>>>>>>>>>>>>>>>>>>>>>>>> your input. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> We will work on publishing the draft >>>>>>>>>>>>>>>>>>>>>>>>> spec (inspired by the view spec) this week to >>>>>>>>>>>>>>>>>>>>>>>>> facilitate further >>>>>>>>>>>>>>>>>>>>>>>>> discussions. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> On Tue, May 28, 2024 at 7:33 PM Jack >>>>>>>>>>>>>>>>>>>>>>>>> Ye <yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> > While it would be great to have a >>>>>>>>>>>>>>>>>>>>>>>>> common set of functions across engines, I don't see >>>>>>>>>>>>>>>>>>>>>>>>> how that is practical >>>>>>>>>>>>>>>>>>>>>>>>> when those engines are implemented so differently. >>>>>>>>>>>>>>>>>>>>>>>>> Plugging in code -- and >>>>>>>>>>>>>>>>>>>>>>>>> especially custom user-supplied code -- seems >>>>>>>>>>>>>>>>>>>>>>>>> inherently specialized to me >>>>>>>>>>>>>>>>>>>>>>>>> and should be part of the engines' design. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> How is this different from the >>>>>>>>>>>>>>>>>>>>>>>>> views? I feel we can say exactly the same thing for >>>>>>>>>>>>>>>>>>>>>>>>> Iceberg views, but yet >>>>>>>>>>>>>>>>>>>>>>>>> we have Iceberg multi-dialect views implemented. >>>>>>>>>>>>>>>>>>>>>>>>> Maybe it sounds like we >>>>>>>>>>>>>>>>>>>>>>>>> are trying to draw a line between SQL vs other >>>>>>>>>>>>>>>>>>>>>>>>> programming language as >>>>>>>>>>>>>>>>>>>>>>>>> "code"? but I think SQL is just another type of code, >>>>>>>>>>>>>>>>>>>>>>>>> and we are already >>>>>>>>>>>>>>>>>>>>>>>>> talking about compiling all these different code >>>>>>>>>>>>>>>>>>>>>>>>> dialects to an >>>>>>>>>>>>>>>>>>>>>>>>> intermediate representation (using projects like >>>>>>>>>>>>>>>>>>>>>>>>> Coral, Substrait), which >>>>>>>>>>>>>>>>>>>>>>>>> will be stored as another type of representation of >>>>>>>>>>>>>>>>>>>>>>>>> Iceberg view. I think >>>>>>>>>>>>>>>>>>>>>>>>> the same functionality can be used for UDFs if >>>>>>>>>>>>>>>>>>>>>>>>> developed. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I actually hink adding UDF support >>>>>>>>>>>>>>>>>>>>>>>>> is a good idea, even just a multi-dialect one like >>>>>>>>>>>>>>>>>>>>>>>>> view, and that can allow >>>>>>>>>>>>>>>>>>>>>>>>> engines to for example parse a view SQL, and when a >>>>>>>>>>>>>>>>>>>>>>>>> function referenced >>>>>>>>>>>>>>>>>>>>>>>>> cannot be resolved, try to seek for a multi-dialect >>>>>>>>>>>>>>>>>>>>>>>>> UDF definition. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> I guess we can discuss more when we >>>>>>>>>>>>>>>>>>>>>>>>> have the actual proposal published. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, May 28, 2024 at 1:32 AM >>>>>>>>>>>>>>>>>>>>>>>>> Robert Stupp <sn...@snazy.de> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> UDFs are as engine specific and >>>>>>>>>>>>>>>>>>>>>>>>> portable and "non-centralized" as views are. The same >>>>>>>>>>>>>>>>>>>>>>>>> performance concerns >>>>>>>>>>>>>>>>>>>>>>>>> apply to views as well. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Iceberg should define a common base >>>>>>>>>>>>>>>>>>>>>>>>> upon which engines can build, so the argument that >>>>>>>>>>>>>>>>>>>>>>>>> UDFs aren't practical, >>>>>>>>>>>>>>>>>>>>>>>>> because engines are different, is probably only a >>>>>>>>>>>>>>>>>>>>>>>>> temporary concern. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> In the long term, Iceberg should >>>>>>>>>>>>>>>>>>>>>>>>> also try to tackle the idea to make views portable, >>>>>>>>>>>>>>>>>>>>>>>>> which is conceptually >>>>>>>>>>>>>>>>>>>>>>>>> not that much different from portable UDFs. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I'm not a fan of adding a >>>>>>>>>>>>>>>>>>>>>>>>> negative touch to the idea of having UDFs in Iceberg, >>>>>>>>>>>>>>>>>>>>>>>>> especially not in >>>>>>>>>>>>>>>>>>>>>>>>> this early stage. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24.05.24 20:53, Ryan Blue wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, Ajantha. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm skeptical about whether it's a >>>>>>>>>>>>>>>>>>>>>>>>> good idea to add UDFs tracked by Iceberg catalogs. I >>>>>>>>>>>>>>>>>>>>>>>>> think that Iceberg >>>>>>>>>>>>>>>>>>>>>>>>> primarily deals with things that are centralized, >>>>>>>>>>>>>>>>>>>>>>>>> like tables of data. >>>>>>>>>>>>>>>>>>>>>>>>> While it would be great to have a common set of >>>>>>>>>>>>>>>>>>>>>>>>> functions across engines, I >>>>>>>>>>>>>>>>>>>>>>>>> don't see how that is practical when those engines >>>>>>>>>>>>>>>>>>>>>>>>> are implemented so >>>>>>>>>>>>>>>>>>>>>>>>> differently. Plugging in code -- and especially >>>>>>>>>>>>>>>>>>>>>>>>> custom user-supplied code >>>>>>>>>>>>>>>>>>>>>>>>> -- seems inherently specialized to me and should be >>>>>>>>>>>>>>>>>>>>>>>>> part of the engines' >>>>>>>>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess we'll know more when you >>>>>>>>>>>>>>>>>>>>>>>>> post the proposal, but I think this would be a very >>>>>>>>>>>>>>>>>>>>>>>>> difficult area to >>>>>>>>>>>>>>>>>>>>>>>>> tackle across engines, languages, and memory models >>>>>>>>>>>>>>>>>>>>>>>>> without having a huge >>>>>>>>>>>>>>>>>>>>>>>>> performance penalty. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 24, 2024 at 8:10 AM >>>>>>>>>>>>>>>>>>>>>>>>> Ajantha Bhat <ajanthab...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Everyone, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a discussion to gauge the >>>>>>>>>>>>>>>>>>>>>>>>> community interest in storing the Versioned SQL UDFs >>>>>>>>>>>>>>>>>>>>>>>>> in Iceberg. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We want to propose the spec >>>>>>>>>>>>>>>>>>>>>>>>> addition for storing the versioned UDFs in Iceberg >>>>>>>>>>>>>>>>>>>>>>>>> (inspired by view spec). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> These UDFs can operate similarly >>>>>>>>>>>>>>>>>>>>>>>>> to views in that they are associated with tables, but >>>>>>>>>>>>>>>>>>>>>>>>> they can accept >>>>>>>>>>>>>>>>>>>>>>>>> arguments and produce return values, or even function >>>>>>>>>>>>>>>>>>>>>>>>> as inline expressions. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Many Query engines like Dremio, >>>>>>>>>>>>>>>>>>>>>>>>> Trino, Snowflake, Databricks Spark supports SQL UDFs >>>>>>>>>>>>>>>>>>>>>>>>> at catalog level [1]. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But storing them in Iceberg can >>>>>>>>>>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Versioning of these UDFs. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Interoperability between the >>>>>>>>>>>>>>>>>>>>>>>>> engines. Potentially engines can understand the UDFs >>>>>>>>>>>>>>>>>>>>>>>>> written by other >>>>>>>>>>>>>>>>>>>>>>>>> engines (with the translate layer). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We believe that integrating this >>>>>>>>>>>>>>>>>>>>>>>>> feature into Iceberg would be a valuable addition, >>>>>>>>>>>>>>>>>>>>>>>>> and we're eager to >>>>>>>>>>>>>>>>>>>>>>>>> collaborate with the community to develop a UDF >>>>>>>>>>>>>>>>>>>>>>>>> specification. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Stephen has already begun drafting >>>>>>>>>>>>>>>>>>>>>>>>> a specification to propose to the community. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let us know your thoughts on this. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dremio - >>>>>>>>>>>>>>>>>>>>>>>>> https://docs.dremio.com/current/reference/sql/commands/functions#creating-a-function >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trino - >>>>>>>>>>>>>>>>>>>>>>>>> https://trino.io/docs/current/sql/create-function.html >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Snowflake - >>>>>>>>>>>>>>>>>>>>>>>>> https://docs.snowflake.com/en/developer-guide/udf/sql/udf-sql-scalar-functions >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Databricks - >>>>>>>>>>>>>>>>>>>>>>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ajantha >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Robert Stupp >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> @snazy >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> -- >>>>>>>>>>>>>>>>>>>>>>>>> >> Ryan Blue >>>>>>>>>>>>>>>>>>>>>>>>> >> Databricks >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>>>>> Databricks >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>